Skip to content

Commit 3a8697d

Browse files
Archive the experiment directory along with git status/diff output (#3105)
# Description This adds the capability to archive the experiment directory. Additionally, this adds options to run `git status` and `git diff` on the `HOMEgfs` global workflow (but not the submodules) and store that information within the experiment directory's archive. These options are specified in `config.base` with the following defaults: ```bash export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR ``` Resolves #2994 # Type of change - [x] New feature (adds functionality) # Change characteristics <!-- Choose YES or NO from each of the following and delete the other --> - Is this a breaking change (a change in existing functionality)? NO - Does this change require a documentation update? YES - Does this change require an update to any of the following submodules? YES (If YES, please add a link to any PRs that are pending.) - [x] wxflow NOAA-EMC/wxflow#45 # How has this been tested? - [x] Local archiving on Hercules for a C48_ATM case - [x] Cycled testing on Hercules with `ARCH_DIFFS=YES` and `ARCH_EXPDIR_FREQ=6,12` - [x] Testing with `ARCH_EXPDIR=NO` or `ARCH_HASHES=NO` # Checklist - [x] Any dependent changes have been merged and published - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have documented my code, including function, input, and output descriptions - [x] My changes generate no new warnings - [x] New and existing tests pass with my changes - [x] This change is covered by an existing CI test or a new one has been added - [x] Any new scripts have been added to the .github/CODEOWNERS file with owners - [x] I have made corresponding changes to the system documentation if necessary --------- Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
1 parent bc61862 commit 3a8697d

File tree

11 files changed

+253
-29
lines changed

11 files changed

+253
-29
lines changed

.flake8

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
exclude = .git,.github,venv,__pycache__,old,build,dist
3+
max-line-length = 160

docs/source/configure.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,15 @@ The global-workflow configs contain switches that change how the system runs. Ma
4848
| | (.true.) or cold (.false)? | | | be set when running ``setup_expt.py`` script with |
4949
| | | | | the ``--start`` flag (e.g. ``--start warm``) |
5050
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
51-
| HPSSARCH | Archive to HPPS | NO | Possibly | Whether to save output to tarballs on HPPS |
51+
| HPSSARCH | Archive to HPPS | NO | NO | Whether to save output to tarballs on HPPS. |
5252
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
53-
| LOCALARCH | Archive to a local directory | NO | Possibly | Instead of archiving data to HPSS, archive to a |
54-
| | | | | local directory, specified by ATARDIR. If |
55-
| | | | | LOCALARCH=YES, then HPSSARCH must =NO. Changing |
56-
| | | | | HPSSARCH from YES to NO will adjust the XML. |
53+
| LOCALARCH | Archive to a local directory | NO | NO | Whether to save output to tarballs locally. For |
54+
| | | | | HPSSARCH and LOCALARCH, ARCDIR specifies the |
55+
| | | | | directory. These options are mutually exclusive. |
56+
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
57+
| ARCH_EXPDIR | Archive the EXPDIR | NO | NO | Whether to create a tarball of the EXPDIR. |
58+
| | | | | ARCH_HASHES and ARCH_DIFFS generate text files |
59+
| | | | | of git output that are archived with the EXPDIR. |
5760
+------------------+----------------------------------+---------------+-------------+---------------------------------------------------+
5861
| QUILTING | Use I/O quilting | .true. | NO | If .true. choose OUTPUT_GRID as cubed_sphere_grid |
5962
| | | | | in netcdf or gaussian_grid |

parm/archive/expdir.yaml.j2

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{% set cycle_YMDH = current_cycle | to_YMDH %}
2+
3+
expdir:
4+
name: "EXPDIR"
5+
# Copy the experiment files from the EXPDIR into the ROTDIR for archiving
6+
{% set copy_expdir = "expdir." ~ cycle_YMDH %}
7+
FileHandler:
8+
mkdir:
9+
- "{{ ROTDIR }}/{{ copy_expdir }}"
10+
copy:
11+
{% for config in glob(EXPDIR ~ "/config.*") %}
12+
- [ "{{ config }}", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
13+
{% endfor %}
14+
- [ "{{ EXPDIR }}/{{ PSLOT }}.xml", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
15+
{% if ARCH_HASHES or ARCH_DIFFS %}
16+
- [ "{{ EXPDIR }}/git_info.log", "{{ ROTDIR }}/{{ copy_expdir }}/." ]
17+
{% endif %}
18+
target: "{{ ATARDIR }}/{{ cycle_YMDH }}/expdir.tar"
19+
required:
20+
- "{{ copy_expdir }}/config.*"
21+
- "{{ copy_expdir }}/{{ PSLOT }}.xml"
22+
{% if ARCH_HASHES or ARCH_DIFFS %}
23+
- "{{ copy_expdir }}/git_info.log"
24+
{% endif %}

parm/archive/master_gdas.yaml.j2

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ datasets:
4040
# Determine if we will save restart ICs or not (only valid for cycled)
4141
{% set save_warm_start_forecast, save_warm_start_cycled = ( False, False ) %}
4242

43-
{% if ARCH_CYC == cycle_HH | int%}
43+
{% if ARCH_CYC == cycle_HH | int %}
4444
# Save the forecast-only cycle ICs every ARCH_WARMICFREQ or ARCH_FCSTICFREQ days
4545
{% if (current_cycle - SDATE).days % ARCH_WARMICFREQ == 0 %}
4646
{% set save_warm_start_forecast = True %}
@@ -97,3 +97,10 @@ datasets:
9797

9898
# End of restart checking
9999
{% endif %}
100+
101+
# Archive the EXPDIR if requested
102+
{% if archive_expdir %}
103+
{% filter indent(width=4) %}
104+
{% include "expdir.yaml.j2" %}
105+
{% endfilter %}
106+
{% endif %}

parm/archive/master_gefs.yaml.j2

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,10 @@ datasets:
1010
{% include "gefs_extracted_ice.yaml.j2" %}
1111
{% include "gefs_extracted_wave.yaml.j2" %}
1212
{% endfilter %}
13+
14+
# Archive the EXPDIR if requested
15+
{% if archive_expdir %}
16+
{% filter indent(width=4) %}
17+
{% include "expdir.yaml.j2" %}
18+
{% endfilter %}
19+
{% endif %}

parm/archive/master_gfs.yaml.j2

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,10 @@ datasets:
9898
{% endfilter %}
9999
{% endif %}
100100
{% endif %}
101+
102+
# Archive the EXPDIR if requested
103+
{% if archive_expdir %}
104+
{% filter indent(width=4) %}
105+
{% include "expdir.yaml.j2" %}
106+
{% endfilter %}
107+
{% endif %}

parm/config/gefs/config.base

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -333,9 +333,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
333333
echo "Both HPSS and local archiving selected. Please choose one or the other."
334334
exit 3
335335
fi
336-
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
337-
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
336+
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
337+
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
338338
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
339+
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
340+
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
341+
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
342+
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR
339343

340344
export DELETE_COM_IN_ARCHIVE_JOB="YES" # NO=retain ROTDIR. YES default in arch.sh and earc.sh.
341345

parm/config/gfs/config.base

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -479,9 +479,13 @@ if [[ ${HPSSARCH} = "YES" ]] && [[ ${LOCALARCH} = "YES" ]]; then
479479
echo "FATAL ERROR: Both HPSS and local archiving selected. Please choose one or the other."
480480
exit 4
481481
fi
482-
export ARCH_CYC=00 # Archive data at this cycle for warm_start capability
483-
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm_start capability
482+
export ARCH_CYC=00 # Archive data at this cycle for warm start and/or forecast-only capabilities
483+
export ARCH_WARMICFREQ=4 # Archive frequency in days for warm start capability
484484
export ARCH_FCSTICFREQ=1 # Archive frequency in days for gdas and gfs forecast-only capability
485+
export ARCH_EXPDIR='YES' # Archive the EXPDIR configs, XML, and database
486+
export ARCH_EXPDIR_FREQ=0 # How often to archive the EXPDIR in hours or 0 for first and last cycle only
487+
export ARCH_HASHES='YES' # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
488+
export ARCH_DIFFS='NO' # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR
485489

486490
# The monitor jobs are not yet supported for JEDIATMVAR.
487491
if [[ ${DO_JEDIATMVAR} = "YES" ]]; then

scripts/exglobal_archive.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import os
44

55
from pygfs.task.archive import Archive
6-
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit
6+
from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, logit, chdir
77

88
# initialize root logger
99
logger = Logger(level=os.environ.get("LOGGING_LEVEL", "DEBUG"), colored_log=True)
@@ -32,7 +32,8 @@ def main():
3232
'DO_AERO_ANL', 'DO_AERO_FCST', 'DO_CA', 'DOIBP_WAV', 'DO_JEDIOCNVAR',
3333
'NMEM_ENS', 'DO_JEDIATMVAR', 'DO_VRFY_OCEANDA', 'FHMAX_FITS', 'waveGRD',
3434
'IAUFHRS', 'DO_FIT2OBS', 'NET', 'FHOUT_HF_GFS', 'FHMAX_HF_GFS', 'REPLAY_ICS',
35-
'OFFSET_START_HOUR']
35+
'OFFSET_START_HOUR', 'ARCH_EXPDIR', 'EXPDIR', 'ARCH_EXPDIR_FREQ', 'ARCH_HASHES',
36+
'ARCH_DIFFS', 'SDATE', 'EDATE', 'HOMEgfs']
3637

3738
archive_dict = AttrDict()
3839
for key in keys:
@@ -47,21 +48,20 @@ def main():
4748
if archive_dict[key] is None:
4849
print(f"Warning: key ({key}) not found in task_config!")
4950

50-
cwd = os.getcwd()
51+
with chdir(config.ROTDIR):
5152

52-
os.chdir(config.ROTDIR)
53+
# Determine which archives to create
54+
arcdir_set, atardir_sets = archive.configure(archive_dict)
5355

54-
# Determine which archives to create
55-
arcdir_set, atardir_sets = archive.configure(archive_dict)
56+
# Populate the product archive (ARCDIR)
57+
archive.execute_store_products(arcdir_set)
5658

57-
# Populate the product archive (ARCDIR)
58-
archive.execute_store_products(arcdir_set)
59+
# Create the backup tarballs and store in ATARDIR
60+
for atardir_set in atardir_sets:
61+
archive.execute_backup_dataset(atardir_set)
5962

60-
# Create the backup tarballs and store in ATARDIR
61-
for atardir_set in atardir_sets:
62-
archive.execute_backup_dataset(atardir_set)
63-
64-
os.chdir(cwd)
63+
# Clean up any temporary files
64+
archive.clean()
6565

6666

6767
if __name__ == '__main__':

0 commit comments

Comments
 (0)