Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

julienbonastre · 2025-02-05T02:08:24Z

Description

As of this last week or so we have not been able to successfully run GHA WF pipelines via our LargeHosted arm64 runners without encountering this issue at some random point of their processing.

It is random in nature however has a greater rate of failures than successful executions. And a successful job run in one attempt generally will not mean the same job rerun will succeed (in fact, matching the probability rate already mentioned it will likely fail as well 80%+ of the time)

There has been no change to the underlying tooling or dependencies and the same operations that are failing on the GH LHR arm64 instances are running in alternate ARM64 environments without any issue consistently and repeatedly.

The last successful pipelines running on the 29th of Jan on this ubuntu-24.04 arm64 2.321.0 build, since then multiple pipelines using this same pattern and these runners no longer work

Platforms affected

Azure DevOps
GitHub Actions - Standard Runners
GitHub Actions - Larger Runners

Runner images affected

Image version and build link

2.322.0

Is it regression?

2.321.0

Expected behavior

PIpeline runs successfully and no errors

Actual behavior

Pipeline crashes out around 80%+ of the time on one of the matrix jobs which encounter and produce a signal: segmentation fault (core dumped) at some point in their workflow steps.

These pipelines are utilising a terragrunt action which parses and manages TG/TF stacks in a monorepo, however we are seeing this same impact on any repo which is using terragrunt on the LargeHosted arm64 runners.

Repro steps

Build a pipeline which uses autero1/action-terragrunt@v3 (with terragrunt-version: "v0.50.14") and runs-on a large hosted arm64-backed runner (we are running on ubuntu-24.04 arm image) and performs some terragrunt operation (i.e. init/validate or plan etc)

Invoke said pipeline and await it's completion.
If it doesn't fail the first time, re-run it a few times to confirm
Our rate is around 1/5 attempts where it progresses or proceeds slightly further before failing again.

The text was updated successfully, but these errors were encountered:

julienbonastre · 2025-02-05T02:55:50Z

#11541 #11533 are sounding verrrryy similiar and timing too... just saying...

hemanthmanga · 2025-02-05T04:16:51Z

Hi @julienbonastre, Thank you for bringing this issue to our attention. We are looking into this issue and will update you on this issue after investigating.

vidyasagarnimmagaddi · 2025-02-05T17:00:01Z

Hi, @julienbonastre , Kindly raise the issue in repo , for Arm runners. thanks closing the issue.

julienbonastre added bug report needs triage labels Feb 5, 2025

julienbonastre changed the title ~~Large Hosted arm64 Runners - Segmentation Fault~~ Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 Feb 5, 2025

hemanthmanga assigned vidyasagarnimmagaddi Feb 5, 2025

hemanthmanga added OS: Ubuntu OS: Ubuntu24 and removed needs triage labels Feb 5, 2025

vidyasagarnimmagaddi closed this as completed Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

julienbonastre commented Feb 5, 2025

julienbonastre commented Feb 5, 2025

hemanthmanga commented Feb 5, 2025

vidyasagarnimmagaddi commented Feb 5, 2025

Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

Large Hosted arm64 Runners - Segmentation Fault since v2.322.0 Ubuntu-24.04 #11542

Comments

julienbonastre commented Feb 5, 2025

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

julienbonastre commented Feb 5, 2025

hemanthmanga commented Feb 5, 2025

vidyasagarnimmagaddi commented Feb 5, 2025