-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cesm3.0 add marbl #269
Cesm3.0 add marbl #269
Conversation
Cesm3.0 alpha02b
Also changed existing MOM6 layouts to explicitly look for non-MARBL configuration
still running some timing experiments to see what layout fits S and XL on derecho
I'm having trouble getting 20 SYPD in MOM6 with MARBL enabled; 25 nodes gives me 19.9 myears / day in the ocean but 18.16 SYPD overall. Increasing to 27 nodes or 30 nodes both slow the model down, possibly due to increased communication or possibly due to the machine being busy? We might want to adjust the XL layout in a future alpha tag.
Until we decide to turn MARBL on by default, the compsets should not change. However, I created a temporary BLT1850_MARBL compset to make it easier to run with MARBL while we do final testing (also, I added an ERI test for that compset so we aren't surprised by anything when we update the compset definitions)
I added a new Unrelated to MARBL, I also removed PE layouts if any of the following applied:
|
I should note that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are satisfied with the load balancing of MARBL compsets please add them to the cesm3 timing table at https://cseg.cgd.ucar.edu/timing/login/
Oh, and the XL pelayout needs to be retuned -- MOM6 can't get 20 SYPD with MARBL turned on, likely due to I/O issues when running with larger core counts. Reducing the number of CAM tasks might keep throughput the same but decrease overall cost? The existing XL layout without MARBL claims 20 SYPD / 7100 pe-hrs/simyear, though I saw 17.9 SYPD / 8000 pe-hrs/simyear when I ran (during the tutorial, so computer was busier than usual). Turning MARBL on, I saw 18.3 SYPD and 11500 pe-hrs/simyear but even when the machine is cooperating I never got to the 20 SYPD in ocean-only testing. |
At @jedwards4b's suggestion, I ran a series of
The
Before I add any of this to the timing table, I'd like to wait and see how throughput compares (at least for the medium size layout) in the CESM3 development runs. looking at these results, I wouldn't be surprised if we need to tweak the layouts a little bit... |
Looking at the timing table for XL:
The ocean is waiting some 56s per coupling interval. |
@mnlevy1981 You'll need to resolve the conflict before I can merge - I can't push to your fork. |
Description of changes
Updates necessary to run MOM6 with MARBL, also removes cheyenne-specific and POP-specific PE layouts
Specific notes
Requires FMS tag
fi_240807
, CMEPS tagged after ESCOMP/CMEPS#493, and MOM interface tagmi_240805
Fixes: #268 (support turning MARBL tracers on in MOM6)
User interface changes?: No
Testing performed (automated tests and/or manual tests): Once I'm done making changes, I'll add a comment with the testing I've done + take this out of draft mode