Skip to content

anlstat jobs fail in multiple g-w CI cases #4400

@RussTreadon-NOAA

Description

@RussTreadon-NOAA

What is wrong?

Encounter the following DEAD jobs while running g-w CI for #4386,

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_3DVarAOWCDA_pr4386
202103250000            gdas_anlstat                    89264531                DEAD                 271         2         253.0
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_hybAOWCDA_pr4386
202103250000            gdas_anlstat                    89263864                DEAD                 271         2         286.0
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_atm3DVar_extended_pr4386
202112210000            gdas_anlstat                    89263994                DEAD                 271         2         254.0
202112210600            gdas_anlstat                    89269096                DEAD                 271         2         255.0
202112211200            gdas_anlstat                    89275169                DEAD                 271         2         253.0
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmDA_pr4386
202112210000            gdas_anlstat                    89264018                DEAD                 271         2         255.0
202112210600            gdas_anlstat                    89268227                DEAD                 271         2         253.0

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_gcafs_cycled_pr4386
202112201800           gcdas_anlstat                    89262274                DEAD                 271         2          69.0
202112210000           gcdas_anlstat                    89266081                DEAD                 271         2          70.0

What should have happened?

anlstat jobs should successfully run to completion in all g-w CI cases

What machines are impacted?

WCOSS2

What global-workflow hash are you using?

82a104a

Steps to reproduce

  1. clone g-w jiaruidong2017:feature/snow-ghcn2 at 82a104a. This fork is 9 commits ahead of g-w develop at bc6c292
  2. build and link g-w
  3. run any of the above listed failed g-w CI cases

Additional information

All the DEAD anlstat jobs failed with the error

  File "/lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/pr4386/ush/python/pygfs/jedi/jedi.py", line 248, in render_jcb_template
    raise WorkflowException(f"An error occurred while rendering JCB template for algorithm {algorithm}:\n{e}") from e
wxflow.exceptions.WorkflowException: An error occurred while rendering JCB template for algorithm anlstat:
Resolving templates for anlstat failed with the following exception:
gsi_sfc_ps.yaml.j2

A search for gsi_sfc_ps in the installed $HOMEgfs for PR #4386 found one occurrence

$HOMEgfs/sorc/gdas.cd/parm/anlstat/atmos_gsi/atmos_gsi_obs_list.yaml.j2

This file contains

observations:
- gsi_sfc_ps

A check of the run directories for the failed jobs finds

-rw-r--r-- 1 russ.treadon emc 5319459 Jan  5 15:38 atmos_gsi_ioda_anl/sfc_ps_anl_2021032500.nc
-rw-r--r-- 1 russ.treadon emc 5331241 Jan  5 15:37 atmos_gsi_ioda_ges/sfc_ps_ges_2021032500.nc

There is a disconnect between the name of the netcdf stat file in the run directory and the name specified in the yaml.

At first it seemed odd that the gcdas_anlstat job log files reference gsi_sfc_ps.yaml.j2. The gcdas does not run the GSI. A check of the gcdas_anlstat.log files shows that 'STAT_ANALYSES': ['aero', 'atmos_gsi']. This agrees with logic in dev/scripts/exglobal_analysis_stats.py

    # Create list based on DA components                                                                                                                                               
    config.STAT_ANALYSES = []
    if config.DO_AERO_ANL:
        config.STAT_ANALYSES.append('aero')
    if config.DO_JEDISNOWDA:
        config.STAT_ANALYSES.append('snow')
    if config.DO_JEDIATMVAR:
        config.STAT_ANALYSES.append('atmos')
    else:
        config.STAT_ANALYSES.append('atmos_gsi')

and the DO_* setting in the C96_gcdas_cycled config.base

+++ config.base[268]export DO_AERO_ANL=YES
+++ config.base[367]export DO_JEDIATMVAR=NO

Do you have a proposed solution?

Based on the above findings for g-w CI cases excluding C96_gcafs_cycled do we

  1. replace gsi_sfc_ps with sfc_ps in $HOMEgfs/sorc/gdas.cd/parm/anlstat/atmos_gsi/atmos_gsi_obs_list.yaml.j2
  2. modify the script creating atmos_gsi_ioda_* files to create gsi_sfc_ps_* instead of sfc_ps_*
  3. something else?

For the C96_gcafs_cycled case is the logic that sets config.STAT_ANALYSES in exglobal_analysis_stats correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions