Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Linux arm64 build and tests on native CI runners #61

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dennisameling
Copy link
Contributor

In #59 (comment), the following was mentioned:

I suspect there is a bug in a dependency we're linking against when cross-compiling for aarch64 linux on an x64 linux box (which is how we produce the aarch64 build of ringrtc), and I have a theory of what the bug is.

I do not have an aarch64 linux device at hand, so it's a bit challenging to test, but I will try to see if any colleagues have one.

GitHub now offers hosted Linux arm64 runners, so it's possible to build and test natively on this architecture. This PR updates the CI pipeline accordingly.

@mutexlox-signal
Copy link
Contributor

mutexlox-signal commented Feb 4, 2025

Looks like this build is failing at the moment:

> electron-mocha --renderer --recursive dist/test --timeout 10000 --require source-map-support/register

[12113:0130/165621.394000:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.394050:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.394061:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.394069:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.417360:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.434039:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12113:0130/165621.434060:ERROR:bus.cc(407)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12142:0130/165621.654843:ERROR:viz_main_impl.cc(196)] Exiting GPU process due to errors during initialization
[12150:0130/165621.715495:ERROR:command_buffer_proxy_impl.cc(127)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.

ERROR: /home/runner/work/ringrtc/ringrtc/src/node/build/linux/libringrtc-arm64.node: undefined symbol: __arm_tpidr2_save

https://github.com/signalapp/ringrtc/actions/runs/13048532063/job/36431086293?pr=61 has all of the details, but I'm not sure if they're visible to you.

(perhaps this is related to #62 ?)

@jim-signal
Copy link
Contributor

@dennisameling Hello! Could you change ubuntu-22.04-arm to our ubuntu-22.04-arm64-4-cores runner?

@jim-signal
Copy link
Contributor

We will make a release soon that should fix the build failure here, and remove dependency on GCC 14.0 because we were using a newer Ubuntu runner. Then this PR can be rebased and squashed on that.

@dennisameling
Copy link
Contributor Author

Could you change ubuntu-22.04-arm to our ubuntu-22.04-arm64-4-cores runner?

I actually wonder what the benefit would be here. ubuntu-22.04-arm is now offered by GitHub for free (for OSS projects), and already has 4 cores. What would be the benefit of using your own runner in this case? Since I've already updated it as you asked I'll leave it as-is, but was just curious. Thanks!

@jim-signal
Copy link
Contributor

ubuntu-22.04-arm is now offered by GitHub for free (for OSS projects)

GitHub doesn't currently support using the ubuntu-2x.04-arm runners in private test/development repositories yet, so we need to use an organizational runner for now.

@jim-signal
Copy link
Contributor

@dennisameling Any ideas about the issue running on arm64?
https://github.com/signalapp/ringrtc/actions/runs/13266627063/job/37120409305

We can take a look, but may not be able to get to it right away. Otherwise the PR changes look good so far. Thanks!

@dennisameling
Copy link
Contributor Author

The interesting thing is that I'm not seeing this issue on the GitHub-hosted ubuntu-22.04-arm runner. I ran the pipeline on my fork yesterday, and while it succeeded to install the dependencies, it failed further down the pipeline with:

ERROR: /home/runner/work/ringrtc/ringrtc/src/node/build/linux/libringrtc-arm64.node: undefined symbol: __arm_tpidr2_save

So that's another error than the error which is coming from your org-level ubuntu-22.04-arm64-4-cores runner:

Failed to connect to bus: $DBUS_SESSION_BUS_ADDRESS and $XDG_RUNTIME_DIR not defined (consider using --machine=<user>@.host --user to connect to bus of other user)
Error: Process completed with exit code 1.

Are you using a custom image for your org-level arm64 runner by any chance, or the default one from partner-runner-images? Just trying to understand if there would be any difference at all in the GitHub-hosted OSS arm64 runners vs the custom (but GitHub-hosted) org-level ones.

@mutexlox-signal
Copy link
Contributor

The interesting thing is that I'm not seeing this issue on the GitHub-hosted ubuntu-22.04-arm runner. I ran the pipeline on my fork yesterday, and while it succeeded to install the dependencies, it failed further down the pipeline with:

ERROR: /home/runner/work/ringrtc/ringrtc/src/node/build/linux/libringrtc-arm64.node: undefined symbol: __arm_tpidr2_save

Curious, I would have thought that would be fixed by e07256b / by using 22.04 rather than 24.04...

So that's another error than the error which is coming from your org-level ubuntu-22.04-arm64-4-cores runner:

Failed to connect to bus: $DBUS_SESSION_BUS_ADDRESS and $XDG_RUNTIME_DIR not defined (consider using --machine=<user>@.host --user to connect to bus of other user)
Error: Process completed with exit code 1.

Are you using a custom image for your org-level arm64 runner by any chance, or the default one from partner-runner-images? Just trying to understand if there would be any difference at all in the GitHub-hosted OSS arm64 runners vs the custom (but GitHub-hosted) org-level ones.

I believe we're using the standard image; it's described as "Ubuntu 22.04 by Arm Limited"

@mutexlox-signal
Copy link
Contributor

I also see this:

- Upstream recommends to use 'WirePlumber' instead 'pipewire-media-session'      
    as session manager, to get it add another PPA,      
    'sudo add-apt-repository ppa:pipewire-debian/wireplumber-upstream'      
    For more instruction read : https://pipewire-debian.github.io/

perhaps we need to switch where we're grabbing the pipewire packages from?

@jim-signal
Copy link
Contributor

Are you using a custom image for your org-level arm64 runner by any chance, or the default one from partner-runner-images?

The runner we are using is based on the same ARM partner-runner-images, just with more memory and storage. It is a mystery then why it would work with the public ones, because they appear to be directly using the partner-runner-images too.

@dennisameling
Copy link
Contributor Author

The runner we are using is based on the same ARM partner-runner-images, just with more memory and storage. It is a mystery then why it would work with the public ones, because they appear to be directly using the partner-runner-images too.

Interesting. I've just configured an org-level runner on my end as well, and am running into the same issue. So there must be some sort of difference in either the machines or the images that these org-level runners run on. Just created a Support Ticket (#3231169) to ask the GitHub team what the difference between the two could be.

@dennisameling
Copy link
Contributor Author

GitHub support recommended opening an issue in the partner-runner-images repo, so here we go: actions/partner-runner-images#53

@dennisameling
Copy link
Contributor Author

The interesting thing is that I'm not seeing this issue on the GitHub-hosted ubuntu-22.04-arm runner. I ran the pipeline on my fork yesterday, and while it succeeded to install the dependencies, it failed further down the pipeline with:

ERROR: /home/runner/work/ringrtc/ringrtc/src/node/build/linux/libringrtc-arm64.node: undefined symbol: __arm_tpidr2_save

Curious, I would have thought that would be fixed by e07256b / by using 22.04 rather than 24.04...

I just installed an Ubuntu 22.04 arm64 VM and tested this by building locally. I'm seeing the following:

  • Everything up until and including v2.49.2 works fine on Ubuntu 22.04 arm64, all tests in npm test are passing.
  • Everything starting with v2.49.3 fails with undefined symbol: __arm_tpidr2_save.

I even compiled WebRTC from scratch for v2.49.5 (from an x64 host), then used that in my build, and got the same error.

Interestingly, the same error doesn't show up on Ubuntu 24.04. e07256b at least fixed the GCC_14 error I saw earlier on Ubuntu 22.04, so that's a great start. But I think some dependency got updated in v2.49.3 which caused the build to fail.

I found this Chromium bug report about __arm_tpidr2_save and it appears to have been fixed (?) as of Jan 9 2025. It seems to be related to the LLVM build toolchain that they use. But I used the most recent depot_tools and toolchains I believe in my builds, yet it's still not working... I'm at a loss at what to look at next for this issue. Do you have any clue?

@dennisameling
Copy link
Contributor Author

dennisameling commented Feb 20, 2025

Found another possibly related Chromium issue. They pushed this commit in August to only set libyuv_use_sme on arm64 Linux due to issues on other platforms.

I wonder if the team here would be open to try and set libyuv_use_sme to false in the Linux arm64 build? It looks like SME is an optional extension that was added in ARMv9.2-A. AFAIK most arm64 devices out in the wild are still ARMv8 (e.g. even the most recent Raspberry Pi models are still ARMv8). Another option would be to find if the build somehow uses -march=armv9-a+sme and set that to -march=armv8-a instead on Linux.

I'd be happy to try and build things myself and test on arm64 hardware, but would need to know where to set this flag. I have a feeling that simply adding it to the WEBRTC_ARGS might work.

@mutexlox-signal
Copy link
Contributor

Found another possibly related Chromium issue. They pushed this commit in August to only set libyuv_use_sme on arm64 Linux due to issues on other platforms.

Oh, great find!

I wonder if the team here would be open to try and set libyuv_use_sme to false in the Linux arm64 build? It looks like SME is an optional extension that was added in ARMv9.2-A. AFAIK most arm64 devices out in the wild are still ARMv8 (e.g. even the most recent Raspberry Pi models are still ARMv8). Another option would be to find if the build somehow uses -march=armv9-a+sme and set that to -march=armv8-a instead on Linux.

Given that the alternative is that the arm64 linux build is broken, I'm inclined to say "yes," but I'm going to discuss with my other team members.

I'd be happy to try and build things myself and test on arm64 hardware, but would need to know where to set this flag. I have a feeling that simply adding it to the WEBRTC_ARGS might work.

I believe that's the right place, yes.

@dennisameling
Copy link
Contributor Author

Here's a PR that sets libyuv_use_sme to false. I've confirmed that it fixes the undefined symbol: __arm_tpidr2_save error on Linux ARMv8 devices 🎉

@jim-signal
Copy link
Contributor

Here's a PR that sets libyuv_use_sme to false. I've confirmed that it fixes the undefined symbol: __arm_tpidr2_save error on Linux ARMv8 devices 🎉

Thanks @dennisameling, we are trying it out and plan to merge it in.

@jim-signal
Copy link
Contributor

we are trying it out and plan to merge it in.

I can confirm that the libyuv_use_sme fix works (there is a webrtc tag 6834e for it). I have your commits ready but we are still blocked on the runner issue. I tried several workarounds, buy couldn't get it working yet. Hopefully there will be some movement in partner-runner-images soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants