Skip to content

Refactor GEOS_GcmGridComp.F90: isolate replay logic and A-O regridding in preparation for dual-GCM replay redesign and MAPL-managed coupling #1418

@tclune

Description

@tclune

Background and Motivation

This issue captures a planned refactoring of GEOS_GcmGridComp.F90 (and constrained touches to GEOS_AgcmGridComp.F90) driven by two upcoming architectural changes:

  1. Dual-GCM replay redesign: Rather than a single GCM instance running both predictor and corrector phases (via clock rewind), two separate GCM instances will be used, orchestrated by a new parent component. All replay alarm management currently in GCM::Initialize and GCM::Run will migrate to that orchestrator.

  2. MAPL-managed A-O regridding: Proper ESMF geom objects will be introduced for AGCM and OGCM. MAPL will then automatically insert appropriate couplers, replacing the current hand-written MAPL_LocStreamTransform-based exchange in RUN_OCEAN.

The refactoring goal is not to implement either of those changes now. It is to make the current code sufficiently clear and cohesive that each can be excised cleanly when the time comes, rather than requiring grep-hunts across multiple files.


Analysis: Current Problems in GEOS_GcmGridComp.F90

1. Module-Level Mutable State (Serious)

Lines 38-67 contain bare module-level variables outside any type:

integer :: NUM_ICE_CATEGORIES, NUM_ICE_LAYERS
integer :: DO_CICE_THERMO, DO_OBIO, DO_DATASEA, DO_WAVES, DO_SEA_SPRAY
logical :: seaIceT_extData, DO_DATA_ATM4OCN, DUAL_OCEAN
integer :: AGCM, OGCM, AIAU, ADFI, WGCM, hist, gigatraj
integer :: bypass_ogcm

These are set in SetServices and read in Initialize, Run, and internal subroutines (RUN_OCEAN, OBIO_A2O). This prevents multiple GCM instances, creates invisible coupling across lifecycle methods, and is the exact problem that makes the dual-GCM transition difficult. T_GCM_STATE already exists for alarm and transform state -- the flags belong there too.

2. SetServices Performs Config Computation (Componentization Violation)

SetServices (671 lines) computes and writes defaults for ASSIMILATION_CYCLE, CORRECTOR_DURATION, MKIAU_FREQUENCY, MKIAU_REFERENCE_TIME, REPLAY_FILE_FREQUENCY, CONVPAR_OPTION, and AERO_PROVIDER back into the shared ESMF config via MAPL_ConfigSetAttribute. Children then read these in their own SetServices calls, creating an implicit initialization-order dependency between sibling SetServices invocations that is invisible from the children's perspective.

3. Initialize Mixes Grid Setup and Replay Alarm Logic

Initialize (862 lines) does two distinct jobs:

  • Grid and exchange-stream setup (~lines 920-996)
  • Replay alarm creation and timing arithmetic (~lines 997-1344, ~350 lines)

The alarm setup block contains non-trivial arithmetic (predictor duration loop, REPLAY_ENDDATE sign-encoding), a MKIAU_RingDate guard that errors at runtime if the config key is already set, and cross-validation of PREDICTOR_DURATION. This block should be independently readable.

4. The Full Predictor Loop Is Inline in Run

The REPLAY: if ... end if REPLAY block (lines 1794-2068 of Run) contains the predictor-corrector cycle: inner time-stepping loop, clock advancing, clock reversing, alarm save/restore, MKIAU calls, and checkpoint writing. Problems:

  • The checkpoint-writing block is duplicated verbatim at two call sites (lines 1872-1890 and 1956-1974)
  • Comment at line 1843: ! fix the predictor alarm (huh?) -- a documented mystery
  • This is the entire block that disappears in the dual-GCM redesign and should be a clearly labeled, self-contained subroutine

5. Alarm Names as Bare String Literals Across Files

Alarm names ('PredictorAlarm', 'startReplay', 'replayCycle', 'ReplayShutOff', 'RegularReplay09', 'ExactReplay09', 'PredictorActive', 'ReplayMKIAU') are bare string literals scattered across GCM::Initialize, GCM::Run, AGCM::Initialize, and AGCM::Run. There is no single place to update them when the alarm topology changes in the dual-GCM redesign, and no compile-time consistency guarantee.

6. RUN_OCEAN and RUN_WAVES Are Implicit-Closure Subroutines

RUN_OCEAN and RUN_WAVES are internal subroutines of Run, closing over VM, GCS, GIM, GEX, XFORM_A2O, XFORM_O2A, expSKIN, impSKIN, DO_CICE_THERMO, DO_DATASEA, seaIceT_extData, DO_OBIO, DO_WAVES, and DUAL_OCEAN from the enclosing scope. All dependencies are invisible in the call signature. When MAPL-managed coupling replaces RUN_OCEAN, there is no clean interface to point to as the thing being replaced.

7. Proliferation of Nearly-Identical Exchange Helpers

Ten internal subroutines handle A-O exchange (DO_A2O, DO_O2A, and eight SUBTILES_*/UGD/2D variants), differing only in rank and real kind. Four more handle wave coupling (DO_A2W, DO_W2A, DO_O2W, DO_W2O) -- each ~45 lines with identical structure: get two fields, lazily allocate a route handle, call ESMF_FieldRegrid. The wave helpers are 174 lines reducible to ~50 with a single generic helper.

8. Replay Logic Scatter into GEOS_AgcmGridComp.F90

AGCM::Run queries 6 alarms, computes TYPE = PREDICTOR / CORRECTOR / FREERUN / FORECAST, and branches on this throughout its body (~200 lines, 1807-2013). In the dual-GCM design, AGCM::Run should have minimal replay awareness -- the predictor instance sees zero forcings and the corrector sees nonzero ones. The BIAS correction and LAST_CORRECTOR logic are the only pieces that may need to remain, scoped to the corrector instance.

9. Substantial Dead Code

~160 lines of !#-commented code describes an abandoned in-replay history-writing capability (history_setservice, initialize_history, run_history). This creates noise and raises unanswered questions about whether the capability is expected to return.


Proposed Refactoring Plan

All steps are pure structural refactoring -- no algorithmic changes, bit-for-bit identical outputs. Each step is a separate PR.

Step A: Move all module-level flags into T_GCM_STATE

Scope: GEOS_GcmGridComp.F90 only

Move lines 38-67 into T_GCM_STATE. Prerequisite for all subsequent steps. The child index integers (AGCM, OGCM, etc.) also belong here -- they are per-instance handles returned by MAPL_AddChild.

Step B: Introduce named constants for all alarm strings

Scope: GEOS_GcmGridComp.F90; read-only effect on GEOS_AgcmGridComp.F90

Create named parameter constants for all alarm name strings used across the GCM/AGCM boundary. Replace all bare string literals in GCM::Initialize, GCM::Run, AGCM::Initialize, and AGCM::Run. The new orchestrator component will use the same constants, providing compile-time consistency.

Step C: Extract setup_replay_alarms_() from Initialize

Scope: GEOS_GcmGridComp.F90 only

Extract lines 997-1344 of Initialize into a named module subroutine with explicit arguments (GC, MAPL, clock, gcm_internal_state). Initialize becomes a clean three-phase sequence: grid setup -> MAPL_GenericInitialize -> alarm setup -> XFORM/SKIN setup.

In the dual-GCM transition, the call to setup_replay_alarms_() is simply removed from each GCM's Initialize -- the new orchestrator creates alarms instead.

Step D: Extract the predictor loop from GCM::Run

Scope: GEOS_GcmGridComp.F90 only

Extract the REPLAY: if ... end if REPLAY block (lines 1794-2068) into a module subroutine run_replay_predictor_cycle_(). Inside, deduplicate the checkpoint-writing block (currently verbatim at two call sites). Add a block comment: ! This entire subroutine is removed in the dual-GCM replay redesign.

The corrector time step (lines 2080-2127) remains in Run as the normal steady-state path.

Step E: Make RUN_OCEAN and RUN_WAVES explicit-argument module subroutines

Scope: GEOS_GcmGridComp.F90 only. Primary preparation for MAPL-managed coupling.

Convert RUN_OCEAN and RUN_WAVES from implicit-closure internal subroutines to module-level subroutines with fully explicit argument lists. This makes all dependencies visible in the call signature -- when MAPL inserts automatic couplers, removal of RUN_OCEAN is a surgical operation.

Additionally, consolidate DO_A2W, DO_W2A, DO_O2W, DO_W2O into a single generic regrid_field_() subroutine (~50 lines replacing ~174).

Step F: Consolidate AGCM::Run replay footprint

Scope: GEOS_AgcmGridComp.F90. Constrained touch -- no algorithmic changes.

  1. Extract determine_run_type_(): a function that queries alarms, reads PREDICTOR_DURATION and REPLAY_MODE, and returns TYPE. All alarm polling moves here. In the dual-GCM design this function is replaced by reading a config flag or checking whether forcing imports are nonzero.

  2. Add explicit section comments marking predictor-only and corrector-only code paths throughout AGCM::Run as deletion/migration markers for the dual-GCM transition.

Step G: Remove dead code

Scope: GEOS_GcmGridComp.F90

Remove ~160 lines of !#-commented history capability. If this represents a planned future feature, reference a tracking issue in a one-line comment.


Step Sequencing and Risk Summary

Steps A -> B -> C -> D can proceed in order on GEOS_GcmGridComp.F90 alone. Step E is independent and can proceed in parallel. Step F touches GEOS_AgcmGridComp.F90 and should be a separate PR after Steps A and B are merged. Step G can be done at any point.

Step Files Changed Risk Dual-GCM Benefit MAPL-Coupling Benefit
A: Module state -> T_GCM_STATE GcmGridComp Low Enables multiple instances Enables multiple instances
B: Alarm name constants GcmGridComp + AGCM Low Single change point for alarm topology --
C: Extract alarm setup GcmGridComp Low Clear deletion boundary --
D: Extract predictor loop GcmGridComp Medium One-subroutine deletion --
E: Explicit RUN_OCEAN/WAVES GcmGridComp Low -- Clear deletion boundary
F: Consolidate AGCM replay AgcmGridComp Medium Minimize AGCM footprint --
G: Remove dead code GcmGridComp Low Reduces noise Reduces noise

Metadata

Metadata

Assignees

Labels

mapl3-readinessUmbrella label: any issue in the MAPL3 readiness campaign

Type

No fields configured for Smell.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions