Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACCESS-OM3 esmf OOM Parallelism Fix #61

Merged
merged 3 commits into from
May 1, 2024
Merged

ACCESS-OM3 esmf OOM Parallelism Fix #61

merged 3 commits into from
May 1, 2024

Conversation

CodeGat
Copy link
Member

@CodeGat CodeGat commented Apr 30, 2024

The deployment of ACCESS-OM3 is failing because it is being killed by Gadis OOMKiller when installing esmf. This PR explicitly turns down the parallelism to --jobs 4 to reduce the memory used.

See the failures here: ACCESS-NRI/ACCESS-OM3#5

Closes ACCESS-NRI/ACCESS-OM3#8

@CodeGat CodeGat self-assigned this Apr 30, 2024
@CodeGat CodeGat force-pushed the access-om3-8-oom-fix branch from 1279866 to 9587158 Compare April 30, 2024 09:34
@CodeGat CodeGat marked this pull request as ready for review April 30, 2024 09:34
@CodeGat CodeGat force-pushed the access-om3-8-oom-fix branch from 9587158 to 9059551 Compare April 30, 2024 22:57
@CodeGat
Copy link
Member Author

CodeGat commented Apr 30, 2024

Alternatively, we could have a vars.SPACK_INSTALL_PARALLEL_JOBS var that can be set to --jobs 4 (or not set at all) leading to at least a difference in parallel jobs on a per-model basis. I think that might be better (although it would lead to another thing in the vars context

@CodeGat
Copy link
Member Author

CodeGat commented May 1, 2024

For those with access, I've set vars.SPACK_INSTALL_PARALLEL_JOBS in ACCESS-OM3 https://github.com/ACCESS-NRI/ACCESS-OM3/settings/environments/2707603294/edit to --jobs 4. It is unset in ACCESS-OM2, and shouldn't affect the spack install command, since an unset vars expands to ''

@CodeGat CodeGat requested a review from aidanheerdegen May 1, 2024 00:19
Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the additional inline comments. They do assist in scaffolding understanding and comprehension.

@CodeGat CodeGat merged commit dbbc77b into main May 1, 2024
@CodeGat CodeGat deleted the access-om3-8-oom-fix branch May 1, 2024 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ACCESS-OM3 Deployment on Gadi Killed during esmf build Phase
2 participants