Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

Closed
2 of 15 tasks
julienbonastre opened this issue Feb 5, 2025 · 3 comments
Closed
2 of 15 tasks

Comments

@julienbonastre
Copy link

Description

As of this last week or so we have not been able to successfully run GHA WF pipelines via our LargeHosted arm64 runners without encountering this issue at some random point of their processing.

It is random in nature however has a greater rate of failures than successful executions. And a successful job run in one attempt generally will not mean the same job rerun will succeed (in fact, matching the probability rate already mentioned it will likely fail as well 80%+ of the time)

There has been no change to the underlying tooling or dependencies and the same operations that are failing on the GH LHR arm64 instances are running in alternate ARM64 environments without any issue consistently and repeatedly.

The last successful pipelines running on the 29th of Jan on this ubuntu-24.04 arm64 2.321.0 build, since then multiple pipelines using this same pattern and these runners no longer work

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022
  • Windows Server 2025

Image version and build link

2.322.0

Is it regression?

2.321.0

Expected behavior

PIpeline runs successfully and no errors

Actual behavior

Pipeline crashes out around 80%+ of the time on one of the matrix jobs which encounter and produce a signal: segmentation fault (core dumped) at some point in their workflow steps.

Image

These pipelines are utilising a terragrunt action which parses and manages TG/TF stacks in a monorepo, however we are seeing this same impact on any repo which is using terragrunt on the LargeHosted arm64 runners.

Repro steps

  1. Build a pipeline which uses autero1/action-terragrunt@v3 (with terragrunt-version: "v0.50.14") and runs-on a large hosted arm64-backed runner (we are running on ubuntu-24.04 arm image) and performs some terragrunt operation (i.e. init/validate or plan etc)

Image

  1. Invoke said pipeline and await it's completion.
  2. If it doesn't fail the first time, re-run it a few times to confirm
  3. Our rate is around 1/5 attempts where it progresses or proceeds slightly further before failing again.
@julienbonastre julienbonastre changed the title Large Hosted arm64 Runners - Segmentation Fault Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 Feb 5, 2025
@julienbonastre
Copy link
Author

#11541 #11533 are sounding verrrryy similiar and timing too... just saying...

@hemanthmanga
Copy link
Contributor

Hi @julienbonastre, Thank you for bringing this issue to our attention. We are looking into this issue and will update you on this issue after investigating.

@vidyasagarnimmagaddi
Copy link
Contributor

Hi, @julienbonastre , Kindly raise the issue in repo , for Arm runners. thanks closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants