Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

Open
2 of 16 tasks
RadxaYuntian opened this issue Jan 26, 2025 · 10 comments
Open
2 of 16 tasks

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

RadxaYuntian opened this issue Jan 26, 2025 · 10 comments

Comments

@RadxaYuntian
Copy link

Description

We have a scheduled job running every Sunday, which failed today, with no code change in last 2 weeks.

After checking the build log, it always failed at a dkms package installation. Once the workflow file is changed to print the dkms log, the error is always gcc segment fault.

Changing running environment to ubuntu22-04 fixed the segment fault. Action still failed but that's because the change we made to investigate this issue.

What may be unusual for us is that we are using binfmt to run aarch64 gcc in a devcontainer, because the final output is an aarch64 system image. So this is not some normal gcc failing.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022
  • Windows Server 2025

Image version and build link

20250120.5.0

Is it regression?

20250105.1.0: https://github.com/RadxaOS-SDK/rsdk/actions/runs/12848906725

Expected behavior

DKMS install successfully without gcc segfault.

Actual behavior

gcc segfault:

   2025-01-26 07:47:36,252 bdebstrap ERROR: mmdebstrap failed with exit code 25. See above for details.
  
  /workspaces/rsdk
  
  DKMS make.log for radxa-overlays-0.1.20 for kernel 6.1.68-2-stable (aarch64)
  Sun Jan 26 07:47:16 UTC 2025
  make: Entering directory '/usr/src/linux-headers-6.1.68-2-stable'
  Segmentation fault (core dumped)
  warning: the compiler differs from the one used to build the kernel
    The kernel was built by: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
    You are using:           gcc (Debian 12.2.0-14) 12.2.0
    CC [M]  /var/lib/dkms/radxa-overlays/0.1.20/build/radxa-overlays.o
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-gpu.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-hdmi.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo
  gcc: internal compiler error: Segmentation fault signal terminated program cc1
  Please submit a full bug report, with preprocessed source (by using -freport-bug).
  See <file:///usr/share/doc/gcc-12/README.Bugs> for instructions.
  make[2]: *** [scripts/Makefile.lib:409: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo] Error 4
  make[1]: *** [scripts/Makefile.build:500: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays] Error 2
  make[1]: *** Waiting for unfinished jobs....

Repro steps

  1. Clone https://github.com/RadxaOS-SDK/rsdk
  2. Cherry pick RadxaYuntian/rsdk@090908a to view dkms log
  3. Trigger workflow_dispatch for build.yaml
@RadxaYuntian
Copy link
Author

RadxaYuntian commented Jan 26, 2025

The gcc version Debian 12.2.0-14 was released on 2023/01/08, so the last successful run (2025/01/19) and today's failed run are both using the same version in the devcontainer.

@deviantintegral
Copy link

I can confirm this as well at https://github.com/pbkhrv/rtl_433-hass-addons/actions/runs/12972957498/job/36181006667. That job is compiling aarch64 in Docker under QEMU (I know, proper cross compiling would be better, but this is what the official Home Assistant builder action does so 🤷 ).

Is there a way to specify the runner image version to a previous 24.04 release to confirm the regression?

@woblerr
Copy link

woblerr commented Jan 26, 2025

The same problem for buildx for linux/arm64 via QEMU: https://github.com/woblerr/docker-pgbackrest/actions/runs/12965488407/job/36165276019#step:7:2658

Rollback to the ubuntu-22.04 runner solved the problem.

@MyreMylar
Copy link

Chiming in to say that we are seeing segfaults on our test runners for pygame-ce in the ppc64le architecture build since getting version 20250120.5.0. and, perhaps related, it also reporting that it can no longer detect the GNU compiler type for our S390x architecture build.

As @deviantintegral says it would be nice to have a way to roll back to a previous runner image to isolate the problem.

@RaviAkshintala
Copy link
Contributor

Hi @RadxaYuntian Thank you for bringing this issue to our attention. We will look into this issue and will update you after investigating.

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue Jan 27, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
@BrianPugh
Copy link

I'm also having very similar issues in tamp when using cibuildwheel to build python wheels for ppc64le and aarch64 targets.

@rtobar
Copy link

rtobar commented Jan 28, 2025

Same issue here with gcc segfault, but in my case I saw it both with ubuntu-latest and ubuntu-20.04. Updating/downgrading to ubuntu-22.04 solved it as mentioned by other people.

stevenhorsman added a commit to confidential-containers/cloud-api-adaptor that referenced this issue Jan 28, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
charlesomer referenced this issue in mikebrady/shairport-sync Jan 28, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
dlech added a commit to pybricks/python-mpy-cross that referenced this issue Feb 2, 2025
@kishorekumar-anchala
Copy link
Contributor

kishorekumar-anchala commented Feb 3, 2025

I believe this is because of mismatch between gcc and kernel version(Kernel Version: 6.8.0-1020-azure) . this kernel version only support with gcc 10 . i request you if possible try to install gcc 10 and re try the test or you can proceed with the ubuntu-22 image until new version released. thank you !

cc @deviantintegral @RadxaYuntian @MyreMylar

@deviantintegral
Copy link

In my case, installing GCC 10 is not easy. The container is using Alpine 3.19 which ships with GCC 13. The oldest Alpine release with GCC 10 is Alpine 3.14 which went out of security support over a year ago. But given then number of issues from various projects referencing this, and your comment above, is it safe to assume this will be fixed in 24.04 in the next few weeks?

mockdeep added a commit to stringer-rss/stringer that referenced this issue Feb 3, 2025
This switches to `ubuntu-22.04` and reenables arm64 builds. It turns out
there's [an issue with the Github Actions runner image][is]. People seem
to be having luck downgrading the image.

[is]: actions/runner-images#11471
mockdeep added a commit to stringer-rss/stringer that referenced this issue Feb 3, 2025
This switches to `ubuntu-22.04` and reenables arm64 builds. It turns out
there's [an issue with the Github Actions runner image][is]. People seem
to be having luck downgrading the image.

[is]: actions/runner-images#11471
Earlopain added a commit to docker-ruby-nightly/ruby that referenced this issue Feb 4, 2025
I'm getting tired of these failures. I thought it would be addressed soonish but apparently not.
actions/runner-images#11471
drivebyer added a commit to OT-CONTAINER-KIT/redis that referenced this issue Feb 5, 2025
@RadxaYuntian
Copy link
Author

We are building a Debian 12 system image, which contains a DKMS package, and DKMS itself was called by the kernel postinst hooks. So it is not like some intermediate environment that we are free to install any software: we need to clean up as well. Also there is no gcc-10 packages in Debian 12 repositories, so we will have to build gcc-10 from source with shipped gcc-12, and try to get rid of gcc-12 without breaking the package dependencies to install this one as the default compiler, then clean up gcc-10, and reinstall gcc-12.

I think this is too much effort for a temporary workaround.

We will roll back to 22.04 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants