Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several RTOFS executables failed with "-check all" and "-ftrapuv" flags (bugzilla #1592) #49

Open
DanIredell-NOAA opened this issue Nov 5, 2024 · 4 comments

Comments

@DanIredell-NOAA
Copy link
Contributor

http://www2.spa.ncep.noaa.gov/bugzilla/show_bug.cgi?id=1592

During RTOFS v2.4 IT TEST, the following executables failed with "-check all" and "-ftrapuv" flags, and also failed with "-check boundary" and "-ftrapuv" flags. Please investigate and update RTOFS codes to pass "-check all" and "-ftrapuv" flags in next upgrade.

exec/rtofs_ncoda_qc -
/lfs/h1/ops/test/output/20240711/rtofs_global_ncoda_qc.o144015979 -
/lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.144015979.cbqs01/amsr.qc.out -
“forrtl: error (63): output conversion error, unit 44, file /lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.144015979.cbqs01/logs/amsr_qc/fort.44”
/lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.144015979.cbqs01/ssh.qc.out -
....
“forrtl: warning (406): fort: (1): In call to I/O Write routine, an array temporary was created for argument #1
....

exec/rtofs_ncoda_acspo_sst_nc -
/lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.161613032.dbqs01/goes.qc.out -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_qc.161613032.dbqs01/goes.qc.out -
....
forrtl: error (65): floating invalid
....

exec/rtofs_ssmis_tol2 -
/lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.161616530.dbqs01/ice.qc.out -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_qc.161616530.dbqs01/ice.qc.out -
...
forrtl: error (65): floating invalid
...

exec/rtofs_ncoda_sat_sss_nc
/lfs/f1/ops/para/tmp/rtofs_global_ncoda_qc.161629409.dbqs01/surf.qc.out -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_qc.161629409.dbqs01/surf.qc.out -
....
forrtl: error (65): floating invalid
....

exec/rtofs_raw2hycom -
/lfs/h1/ops/test/output/20240724/rtofs_global_ncoda_inc.o162020959 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_inc.o162020959 -
....
forrtl: error (65): floating invalid
....

exec/rtofs_hycom_extract -
/lfs/h1/ops/test/output/20240724/rtofs_global_incup.o162020960 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_incup.o162020960 -
...
forrtl: error (65): floating invalid
...

exec/rtofs_hycom_expr -
/lfs/h1/ops/test/output/20240724/rtofs_global_ncoda_inc.o162030739 -
/lfs/h1/ops/test/output/20240724/rtofs_global_incup.o162030740 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_inc.o162030739 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_incup.o162030740 -
...
forrtl: error (65): floating invalid
...

exec/rtofs_hycom2raw8 -
/lfs/h1/ops/test/output/20240724/rtofs_global_ncoda_inc.o162037158, rtofs_global_incup.o162037159
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_inc.o162037158, rtofs_global_incup.o162037159
...
forrtl: error (65): floating invalid
...

exec/rtofs_ncoda_archv_inc -
/lfs/h1/ops/test/output/20240724/rtofs_global_ncoda_inc.o162038852
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_inc.o162038852
....
forrtl: error (65): floating invalid
....

exec/rtofs_hycom_diff -
/lfs/h1/ops/test/output/20240724/rtofs_global_ncoda_inc.o162040416 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_ncoda_inc.o162040416 -
...
forrtl: error (65): floating invalid
...

exec/rtofs_hycom -
/lfs/h1/ops/test/output/20240724/rtofs_global_incup.o162038853 -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_incup.o162038853 -
...
forrtl: error (65): floating invalid
...

exec/rtofs_atmforcing -
/lfs/f1/ops/para/tmp/rtofs_global_analysis_pre.161611314.dbqs01/errfile -
/lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/rtofs_global_analysis_pre.161611314.dbqs01/errfile -
....
zaiost - Array I/O is Fortran DA I/O
forrtl: error (78): process killed (SIGTERM)
...

Notes:

  • The sorc and exec with ‘-check all’ and ‘-ftrapuv’ flags are stored under /lfs/h1/ops/test/packages/rtofs.v2.4.3/sorc.check.all and exec.check.all on DOGWOOD
  • The sorc and exec with ‘-check bounds ’ and ‘-ftrapuv’ flags are stored under /lfs/h1/ops/test/packages/rtofs.v2.4.3/sorc.check.bounds and exec.check.bounds on DOGWOOD
  • All job logfiles and working dirs are saved under /lfs/h1/ops/test/com/rtofs/v2.4/Stability_check.bounds/ on DOGWOOD.
@sanAkel
Copy link

sanAkel commented Jan 23, 2025

@DanIredell-NOAA Were you able to get Jim's help with any of the ⬆ issues?

@DanIredell-NOAA
Copy link
Contributor Author

All these have been resolved. I just haven't updated the issues page or the NCO bugzilla page for any of them yet.

@DanIredell-NOAA
Copy link
Contributor Author

Changes fixed with this update

Along with the fixes, modified the makefiles and build scripts to create the debug exectables for all binaries.

The fixes:

exec/rtofs_ncoda_qc -
Two errors identified. 1) Output conversion error. 2) An array temporary was created. Neither of these errors stop the run. The 2nd error occurs about 50 times. However this task is part of a cfp job and it is usually the first to complete, so improving the timing won’t change the throughput

exec/rtofs_ncoda_acspo_sst_nc --
Error from GOES18 location file. Updated the fix/codaclim/ABI_G18.loc file to correct location.

exec/rtofs_ssmis_tol2 -
Invalid scan number in the ssmi dataset. Changed sorc/rtofs_code.fd/rtofs_ssmis_tol2.cd/ssmisu_decode.f to set ident(3) to be no bigger than huge() so that nint won’t fail when converting it to integer.

exec/rtofs_ncoda_sat_sss_nc -
Some input SMAP HDF5 files have NaNs. Changed sorc/rtofs_ncoda.fd/ncoda_decode/libsrx/sss/rd_smap_hdf.f to set any NaNs to spval to continue processing. The quality flag will be set so that record will be bypassed.

exec/rtofs_raw2hycom
exec/rtofs_hycom_extract
exec/rtofs_hycom_expr
exec/rtofs_hycom2raw8
These were missing the -convert big_endian flag in the debug compile. After adding this flag, these no longer fail.

exec/rtofs_ncoda_archv_inc -
Was also missing the -convert big_endian flag, however after adding this flag it failed. Changed sorc/rtofs_code/rtofs_ncoda_archv_inc.fd/ncoda_archv_lyrinc.f to organize the if statements to avoid referencing indexes beyond the array.

exec/rtofs_hycom_diff -
Was also missing the -convert big_endian flag, however after adding this flag it failed. Changed orc/rtofs_code/rtofs_hycom_diff.fd/bigrid.f to use ip0 which is indexed to 0.
exec/rtofs_hycom -
Failed in geopar.F Pulled changes from hycom repo specifically for this issue. After continuing the run after this fix, continued to run hycom for the next crash. Using this method, also changed mod_cb_arrays.F, mod_momtum.F, and mod_tsadvc.F. All these changes were pulled from the hycom repo.

exec/rtofs_atmforcing -
Not a problem. Since there is no stacktrace the job was probably killed externally.

@sanAkel
Copy link

sanAkel commented Jan 30, 2025

Great! In that, close this issue.

This issue is resolved via 2bf78fb#diff-ac8c3d8f6fd67f02adb47f5b86a7d74b637b21a9b6c9d687d00e648b8e74998b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants