-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure of ufs-weather-model on hercules and orion (probably general) with MAPL/2.40.3 and non spack-stack dependencies. likely build issue of dependencies or MAPL but an issue has been requested #3375
Comments
@bena-nasa is the real expert so I expect we'll hear something from them by late morning. I took a quick look at the relevant lines, but cannot say much more than there is a pointer involved, so the run-time error might be accurate. My expectation is that MAPL should have trapped this rather than the compiler, so while @GeorgeVandenberghe-NOAA may indeed have "misused" MAPL/History, there is likely a bug in MAPL that simply has not been detected before. |
I did not see anything obvious when looking at the lines in the code, I do have access to Hercules so I could certainly try to reproduce and debug with enough instruction. |
I am finding this goes away when I load the same version of the compiler
that Spack-Stack was built with. You all
can probably stand down and I will document it as a compiler issue on
Hercules and see if it also occurs on Gaea. I can't build
my stack on WCOSS2 (and the security reasons why are FULLY valid!) To
document it, I have to demonstrate switching compiler versions in the UFS
build
is sufficient to fix this problem.
…On Fri, Jan 31, 2025 at 6:38 PM Ben Auer ***@***.***> wrote:
I did not see anything obvious when looking at the lines in the code
—
Reply to this email directly, view it on GitHub
<#3375 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FQMT7QHKUWCSGNGFQD2NO7L3AVCNFSM6AAAAABWD3LGJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGA2TKNJZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
good hear, almost had to be a compiler issue bug, etc ... |
Library build options in one of the many dependencies for MAPL was my
earlier and more unnerving hypothesis but results suggest it a simpler
problem, just compiler, and
the version of compiler is old so pursuing that with Intel wouldn't be
productive either.
…On Fri, Jan 31, 2025 at 7:00 PM Ben Auer ***@***.***> wrote:
good hear, almost had to be a compiler issue bug, etc ...
—
Reply to this email directly, view it on GitHub
<#3375 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FTQBS2OQI6MB7V5TQ32NPB4ZAVCNFSM6AAAAABWD3LGJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGA4TENBRGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
Did you Want a replication case?
On Thursday, January 30, 2025, Tom Clune ***@***.***> wrote:
@bena-nasa is the real expert so I expect we'll hear something from them
by late morning. I took a quick look at the relevant lines, but cannot say
much more than there is a pointer involved, so the run-time error might be
accurate.
My expectation is that MAPL should have trapped this rather than the
compiler, so while @GeorgeVandenberghe-NOAA may indeed have "misused"
MAPL/History, there is likely a bug in MAPL that simply has not been
detected before.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.<
…--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
A replication case is on Hercules
/work2/noaa/noaatest/gwv/herc/maplfail.tar
Untar this and follow the README instructions.
On Sun, Feb 2, 2025 at 2:34 PM George Vandenberghe - NOAA Affiliate <
***@***.***> wrote:
… Did you Want a replication case?
On Thursday, January 30, 2025, Tom Clune ***@***.***> wrote:
> @bena-nasa is the real expert so I expect we'll hear something from them
by late morning. I took a quick look at the relevant lines, but cannot say
much more than there is a pointer involved, so the run-time error might be
accurate.
>
> My expectation is that MAPL should have trapped this rather than the
compiler, so while @GeorgeVandenberghe-NOAA may indeed have "misused"
MAPL/History, there is likely a bug in MAPL that simply has not been
detected before.
>
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you were mentioned.<
https://ci3.googleusercontent.com/meips/ADKq_Na-OlpqK40FBm2MxnV4PjP9gBQlDsz2jdtfpjKFFREAf9s5X-4vemEmW1rxMXzZAkC2EPy6l3WoP2gcFwkEPfX0ZzdjK81vCsPY7r2PIc4Pt-snOwWoPX-3Rel73TCac4lINuKBHnFEa_gVzxTaiWJ-s38_DNUf4QFB3DfV4uHx-yK0T0P7BWeOxAQH8BD1vFdb18Uqcw4EPVj9CMKbB8MXd1Pl6JHAg5HNhdrOIjGn8ZmdQs6w__Q=s0-d-e1-ft#https://github.com/notifications/beacon/ANDS4FU4X6IFJKAOYT4KQ332NIT5ZA5CNFSM6AAAAABWD3LGJKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTU4N6ANG.gif>Message
ID: ***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
I believe I am missing something basic and this is not a MAPL defect but user error but I was asked to create an issue when I inquired
about it.
When ufs-weather-model is built with a non spack-stack alternate stack it builds but fails in MAPL at runtime with this trace
this is the error traceback I get
960:
756: forrtl: severe (122): invalid attempt to assign into a pointer that is not associated
756: Image PC Routine Line Source
756: gwv.x 00000000068B54CF Unknown Unknown Unknown
756: gwv.x 0000000002187806 mapl_historygridc 2355 MAPL_HistoryGridComp.F90
756: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
756: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
756: gwv.x 0000000000CF4F3A Unknown Unknown Unknown
756: gwv.x 0000000000CE147B Unknown Unknown Unknown
756: gwv.x 0000000000B6C38A Unknown Unknown Unknown
756: gwv.x 000000000059A2D0 Unknown Unknown Unknown
756: gwv.x 0000000000871381 Unknown Unknown Unknown
756: gwv.x 00000000022B139F mapl_genericmod_m 1896 MAPL_Generic.F90
756: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
756: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
756: gwv.x 0000000000CF4F3A Unknown Unknown Unknown
756: gwv.x 0000000000CE147B Unknown Unknown Unknown
756: gwv.x 0000000000B6C38A Unknown Unknown Unknown
756: gwv.x 000000000059A2D0 Unknown Unknown Unknown
756: gwv.x 0000000000871381 Unknown Unknown Unknown
756: gwv.x 000000000214CF4F mapl_capgridcompm 708 MAPL_CapGridComp.F90
756: gwv.x 0000000002151D2A mapl_capgridcompm 654 MAPL_CapGridComp.F90
756: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
756: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
756: gwv.x 0000000000CF4F3A Unknown Unknown Unknown
756: gwv.x 0000000000CE147B Unknown Unknown Unknown
756: gwv.x 0000000000B6C38A Unknown Unknown Unknown
756: gwv.x 000000000059A2D0 Unknown Unknown Unknown
756: gwv.x 0000000000871381 Unknown Unknown Unknown
756: gwv.x 000000000214A7C0 mapl_capgridcompm 955 MAPL_CapGridComp.F90
756: gwv.x 0000000001E909E7 aerosol_cap_mp_mo 348 Aerosol_Cap.F90
756: gwv.x 0000000000C7D5C8 Unknown Unknown Unknown
756: gwv.x 0000000000C7D533 Unknown Unknown Unknown
756: gwv.x 0000000000C7DBC4 Unknown Unknown Unknown
756: gwv.x 00000000004287FB Unknown Unknown Unknown
756: gwv.x 0000000006784830 Unknown Unknown Unknown
756: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
756: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
756: gwv.x 0000000000CF514A Unknown Unknown Unknown
756: gwv.x 0000000000CE147B Unknown Unknown Unknown
756: gwv.x 0000000000B6C38A Unknown Unknown Unknown
756: gwv.x 000000000059A2D0 Unknown Unknown Unknown
756: gwv.x 0000000000871381 Unknown Unknown Unknown
756: gwv.x 0000000000AF45F0 Unknown Unknown Unknown
756: gwv.x 0000000000B0A262 Unknown Unknown Unknown
756: gwv.x 0000000000B0D2F2 Unknown Unknown Unknown
756: gwv.x 0000000000B25B85 Unknown Unknown Unknown
756: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
756: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
756: gwv.x 0000000000CF4F3A Unknown Unknown Unknown
756: gwv.x 0000000000CE147B Unknown Unknown Unknown
756: gwv.x 0000000000B6C38A Unknown Unknown Unknown
756: gwv.x 000000000059A2D0 Unknown Unknown Unknown
756: gwv.x 0000000000871381 Unknown Unknown Unknown
756: gwv.x 0000000000420F4E MAIN__ 392 UFS.F90
756: gwv.x 000000000041FD7D Unknown Unknown Unknown
756: libc.so.6 000014D6CF6FCEB0 Unknown Unknown Unknown
756: libc.so.6 000014D6CF6FCF60 __libc_start_main Unknown Unknown
756: gwv.x 000000000041FC95 Unknown Unknown Unknown
740: forrtl: severe (122): invalid attempt to assign into a pointer that is not associated
740: Image PC Routine Line Source
740: gwv.x 00000000068B54CF Unknown Unknown Unknown
740: gwv.x 0000000002187806 mapl_historygridc 2355 MAPL_HistoryGridComp.F90
740: gwv.x 0000000000B6AF52 Unknown Unknown Unknown
740: gwv.x 0000000000B6EA9F Unknown Unknown Unknown
740: gwv.x 0000000000CF4F3A Unknown Unknown Unknown
740: gwv.x 0000000000CE147B Unknown Unknown Unknown
When I use the\ /mapl/2.40.3-esmf-8.6.0 (I.E. "Their" mapl) it works.
I build MAPL this way ($PREFIX contains all dependencies in $PREFIX/lib $PREFIX/lib64 and $PREFIX/include and it works for most cases)
cd MAPL-2.40.3
export ZLIB_ROOT=$PREFIX
export PNG_ROOT=$PREFIX
export PIO_ROOT=$PREFIX
export YAFYAML_ROOT=$PREFIX
export jasper_ROOT=$PREFIX
export ESMFMKFILE=$PREFIX/ESMF_8_6_0/lib/esmf.mk
export ESMA_CMAKE_ROOT=$PREFIX
export ECBUILD_ROOT=$PREFIX
export CMAKEMODULES_ROOT=$PREFIX
export NETCDF=$PREFIX
export prefix=$PREFIX
export NetCDF_ROOT=$PREFIX
set -x
export FC=$FC
export CC=$CC
export CXX=$CXX
[[ -d build ]] && rm -rf build
mkdir -p build && cd build
CMAKE_OPTS=""
export PATH=$NETCDF/bin:$PATH
export NTHREADS=4
#source $HOME/envset
cmake ..
-DCMAKE_INSTALL_PREFIX=$NETP/MAPL-2.40.3 -DUSE_F2PY=OFF -DCMAKE_Fortran_COMPILER=$FC
-DCMAKE_MODULE_PATH="${ESMA_CMAKE_ROOT};${CMAKEMODULES_ROOT}/Modules;${ECBUILD_ROOT}/share/ecbuild/cmake"
-DCMAKE_BUILD_TYPE=Release
-DBUILD_WITH_FLAP=OFF
-DBUILD_WITH_PFLOGGER=OFF
-DBUILD_WITH_FARGPARSE=ON
-DESMA_USE_GFE_NAMESPACE=ON
-DBUILD_SHARED_MAPL=OFF
-DUSE_EXTDATA2G=OFF
${CMAKE_OPTS}
gmake -j${NTHREADS:-4} install VERBOSE=1
The compiler is intel
Currently Loaded Modules:
The text was updated successfully, but these errors were encountered: