-
Notifications
You must be signed in to change notification settings - Fork 204
Description
What is wrong?
Encounter the following DEAD jobs while running g-w CI for #4386,
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_3DVarAOWCDA_pr4386
202103250000 gdas_anlstat 89264531 DEAD 271 2 253.0
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_hybAOWCDA_pr4386
202103250000 gdas_anlstat 89263864 DEAD 271 2 286.0
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_atm3DVar_extended_pr4386
202112210000 gdas_anlstat 89263994 DEAD 271 2 254.0
202112210600 gdas_anlstat 89269096 DEAD 271 2 255.0
202112211200 gdas_anlstat 89275169 DEAD 271 2 253.0
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmDA_pr4386
202112210000 gdas_anlstat 89264018 DEAD 271 2 255.0
202112210600 gdas_anlstat 89268227 DEAD 271 2 253.0
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_gcafs_cycled_pr4386
202112201800 gcdas_anlstat 89262274 DEAD 271 2 69.0
202112210000 gcdas_anlstat 89266081 DEAD 271 2 70.0
What should have happened?
anlstat jobs should successfully run to completion in all g-w CI cases
What machines are impacted?
WCOSS2
What global-workflow hash are you using?
Steps to reproduce
- clone g-w
jiaruidong2017:feature/snow-ghcn2at 82a104a. This fork is 9 commits ahead of g-wdevelopat bc6c292 - build and link g-w
- run any of the above listed failed g-w CI cases
Additional information
All the DEAD anlstat jobs failed with the error
File "/lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/pr4386/ush/python/pygfs/jedi/jedi.py", line 248, in render_jcb_template
raise WorkflowException(f"An error occurred while rendering JCB template for algorithm {algorithm}:\n{e}") from e
wxflow.exceptions.WorkflowException: An error occurred while rendering JCB template for algorithm anlstat:
Resolving templates for anlstat failed with the following exception:
gsi_sfc_ps.yaml.j2
A search for gsi_sfc_ps in the installed $HOMEgfs for PR #4386 found one occurrence
$HOMEgfs/sorc/gdas.cd/parm/anlstat/atmos_gsi/atmos_gsi_obs_list.yaml.j2
This file contains
observations:
- gsi_sfc_ps
A check of the run directories for the failed jobs finds
-rw-r--r-- 1 russ.treadon emc 5319459 Jan 5 15:38 atmos_gsi_ioda_anl/sfc_ps_anl_2021032500.nc
-rw-r--r-- 1 russ.treadon emc 5331241 Jan 5 15:37 atmos_gsi_ioda_ges/sfc_ps_ges_2021032500.nc
There is a disconnect between the name of the netcdf stat file in the run directory and the name specified in the yaml.
At first it seemed odd that the gcdas_anlstat job log files reference gsi_sfc_ps.yaml.j2. The gcdas does not run the GSI. A check of the gcdas_anlstat.log files shows that 'STAT_ANALYSES': ['aero', 'atmos_gsi']. This agrees with logic in dev/scripts/exglobal_analysis_stats.py
# Create list based on DA components
config.STAT_ANALYSES = []
if config.DO_AERO_ANL:
config.STAT_ANALYSES.append('aero')
if config.DO_JEDISNOWDA:
config.STAT_ANALYSES.append('snow')
if config.DO_JEDIATMVAR:
config.STAT_ANALYSES.append('atmos')
else:
config.STAT_ANALYSES.append('atmos_gsi')
and the DO_* setting in the C96_gcdas_cycled config.base
+++ config.base[268]export DO_AERO_ANL=YES
+++ config.base[367]export DO_JEDIATMVAR=NO
Do you have a proposed solution?
Based on the above findings for g-w CI cases excluding C96_gcafs_cycled do we
- replace
gsi_sfc_pswithsfc_psin$HOMEgfs/sorc/gdas.cd/parm/anlstat/atmos_gsi/atmos_gsi_obs_list.yaml.j2 - modify the script creating
atmos_gsi_ioda_*files to creategsi_sfc_ps_*instead ofsfc_ps_* - something else?
For the C96_gcafs_cycled case is the logic that sets config.STAT_ANALYSES in exglobal_analysis_stats correct?