Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-HPC Workflows #218

Merged
merged 47 commits into from
Feb 3, 2025
Merged

Multi-HPC Workflows #218

merged 47 commits into from
Feb 3, 2025

Conversation

CodeGat
Copy link
Member

@CodeGat CodeGat commented Jan 23, 2025

Closes #121 super-issue
Close #213
Close #196

Important

This is a major update to build infrastructure. Therefore, there are changes required to Model Deployment Repositories before the v4 branch can be used. These are noted in the 'Major Model Deployment Repository Changes' section

Background

This super Pull Request contains a rework of build-cd infrastructure to support deployment to multiple HPCs at once. The major changes are noted below:

Pull Requests Incorporated

This PR incorporates the following merged pull requests from build-cd:

And the following into other related repositories:

And the changes made solely in this PR (consolidated here):

  • Small fixes done during E2E Testing in fa11ad3
  • Larger fix splitting deploy-2-start.yml deployment-environment input into deployment-target and deployment-type in beee2cf
  • Update all build-cd references to latest (future) default branch (v4) in 6299c48

Major Model Deployment Repository Changes

Since this is a major build-cd bump (due to changed entrypoint workflows, new vars and the overall size of the update), we need to do updates to all Model Deployment Repositories before we can use the new v4 branch. Namely:

  • Add repo-level vars.RELEASE_DEPLOYMENT_TARGETS, vars.PRERELEASE_DEPLOYMENT_TARGETS for model-deployment-repository-level choice on deployment targets.
  • Update repo-level vars.SPACK_YAML_SCHEMA_VERSION to 1-0-4
  • Update ci.ymls pr-closed job inputs from name to root-sbd, due to 4d8a7c8, and give the cd job permissions.pull-requests:write due to fa11ad3
  • Update all references to v3 workflows to v4
  • Replace all instances of Release GitHub Environments of the form SUPERCOMPUTER with SUPERCOMPUTER Release

Testing

E2E Testing was done primarily on ACCESS-NRI/ACCESS_TEST, (Prerelease-related stuff in ACCESS-NRI/ACCESS-TEST#16, and Release-related stuff in ACCESS-NRI/ACCESS-TEST#20).
We used a branch (dev-121-multi-target-workflows_TEST, may be deleted now) off this one in which we add config for an additional GadiTest environment, and call workflows from this branch. See branch comparison here.

Testing Prereleases

✔️ With traditional, single-target spack.yaml

Works.

✔️ With multi-target spack.yaml

Works! See this commit, run and result.

✔️ Using !bump

Still works, see:

✔️ Using !redeploy

Works: See invocation, run and result.

Closing PRs

Works! See run.

Testing Releases

✔️ Deployment

Succeeds, see run, and (ephemeral, may be gone later) deployment accessible via:

module use /g/data/vk83/testing/modules
module load access-test/2025.01.0

✔️ Release Artifact

Works, see run and (Un)official Realease.

✔️ Model DB Upload (stub)

Data for this side of the old DB API looks correct, see run.

CodeGat added 30 commits January 6, 2025 17:00
)

* settings-1-update.yml: Validate settings in parallel based on target

* deploy-2-start.yml: Output modules and spack location from workflow

* deploy-2-start.yml: Capitalize inputs.type values to keep in line with callers

* deploy-1-setup.yml: Update inputs/outputs to be suitable for a matrix job.
This matrix job must have the required I/O to do pre-deploy checks, as well as a deployment.

* Move check-spack-yaml and check-config jobs from ci.yml to deploy-1-setup.yml

* deploy-1-setup.yml: Use new inputs as inputs to deploy-2-start.yml

* deploy-1-start.yml: Upload 'outputs' of workflow as an artifact.
This is due to dynamic matrix job outputs not being collected appropriately.
See https://github.com/orgs/community/discussions/17245

* cd.yml: Run checks of target config in parallel

* ci.yml: Dynamically generate deployment comment, start matrix of deployment jobs, misc changes

* cd.yml: Update deployment matrix jq

* ci.yml: Add newline to deployment comment to fix formatting

* Update job names to include deploy target
Replace `env.SPACK_YAML_MODEL_YQ` to accomodate for multi-target-format `spack.yaml`
aidanheerdegen
aidanheerdegen previously approved these changes Jan 30, 2025
Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vibe seems good. I have some questions, but am happy to merge if you think it appropriate.

This is because all environment are now of the form SUPERCOMPUTER TYPE
@CodeGat CodeGat removed the request for review from aidanheerdegen February 3, 2025 05:57
Update Acceptable Environments to the form `SUPERCOMPUTER TYPE`
@CodeGat
Copy link
Member Author

CodeGat commented Feb 3, 2025

The merged pull request should satisfy all your vibe-based review comments, @aidanheerdegen!

Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hit the big green button. Great work, thanks.

@CodeGat CodeGat merged commit c13c639 into v4 Feb 3, 2025
2 checks passed
@CodeGat
Copy link
Member Author

CodeGat commented Feb 3, 2025

Thanks to @aidanheerdegen, @jo-basevi and @tmcadam for helping review the various PRs within this super-PR 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium type:documentation type:feature New feature version:MAJOR Requires an update to Model Deployment Repositories CI
Projects
Status: Done ✅
2 participants