-
Notifications
You must be signed in to change notification settings - Fork 13.3k
workflows: Rewrite build-ci-container to work on larger runners #117353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also switch them over to the new depot runners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just about to start hacking on this today. 😆
Not too familiar with the depot runners. It seems like they're paid self hosted runners? Is the foundation paying for them?
llvm-premerge-linux-runners
should also work and will point to self hosted runners on the GCP project handling premerge. We intend to use that runner set to run the transitioned premerge jobs once we migrate to GHA.
The Foundation is providing a small budget for testing them out. We're using 'Depot Managed' which means the instances are deployed in the Foundation's AWS account rather than using the machines from Depot's pool.
Are these ready to use now? I don't have a preference for which runners to use. |
Ah, okay. I didn't realize that had taken place. Thanks for the additional context.
Theoretically, yes, they are ready to use and should just work. We can only run one job at a time currently though because we don't have enough quota in the region where the cluster is deployed, but that should be enough for testing. @lnihlen is working on getting additional quota. I think it would probably make more sense to use the Google hosted runners given it doesn't require any budget from the foundation, and lets us consolidate on a single system. I think our runners will also be more powerful too (configured to be 56 vCPUs), but depends upon how the depot runners are configured. |
OK, do you think I should update this PR to use those runners instead or wait a few weeks?
Depot runners are up-to 64 CPUs. The number of CPUs is part of the runs-on label so you can choose the right size machine that fits the job. |
I would say update it to use the new runners. This job doesn't run very often and probably shouldn't take that long to run on a decent sized machine.
Ah, interesting. We don't have that much flexibility currently but could theoretically add the capability. Everything that I have seen/can imagine though either requires not much compute and should probably run on the free Github runners or requires a lot and would benefit from as many cores as possible. |
Yeah, the nice thing about the builds is that they are highly parallel, so if you are paying for runners, you may as well use the one with the most CPUs, because the cost/job ends up being roughly the same no matter how many CPUs you have. |
@boomanaiden154 It looks like the |
Ah, right. I forgot about that detail. My plan was to use https://github.com/GoogleContainerTools/kaniko to build things on the cluster, but that's definitely something I can get to later. |
.github/workflows/containers/github-action-ci/stage1.Dockerfile
Outdated
Show resolved
Hide resolved
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/llvm-project-llvmorg-$LLVM_VERSION/llvm | ||
-DCLANG_DEFAULT_LINKER="lld" | ||
|
||
RUN ninja -C ./build stage2-clang-bolt stage2-install-distribution && ninja -C ./build install-distribution && rm -rf ./build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment on why the rm -rf ./build
is here? I believe it was originally to avoid out of disk space errors. Assuming the depot runners have enough disk, it would still be useful as it probably reduces checkpointing time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to drop it. I think this is a common pattern when doing something on the final image, but since this is just a builder image, I'm not sure if it makes sense to do. I don't have a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable enough to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after addressing the final comment.
We can save money by only supporting on ubuntu version in depot and we need to use the oldest version when building the release binaries.
@tstellar What are your plans on landing this? I wanted to try porting it to the premerge cluster this week, but would prefer to build off of this patch in tree. |
…#117353) Also switch them over to the new depot runners.
Also switch them over to the new depot runners.