We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RelVals 29634.402, 29634.403, 29634.404, 29634.406, 29661.402, 29834.402, 29834.403, 29834.404 failed in CMSSW_15_0_GPU_X_2025-02-04-2300 with StdException:
----- Begin Fatal Exception 05-Feb-2025 02:33:03 CET----------------------- An exception of category 'StdException' occurred while [0] Processing Event run: 1 lumi: 8 event: 703 stream: 3 [1] Running path 'MC_Ele5_Open_Unseeded' [2] Calling method for module HGCalSoARecHitsLayerClustersProducer@alpaka/'hltHgcalSoARecHitsLayerClustersProducer' Exception Message: A std::exception was thrown. /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/alpaka/1.2.0-92470b733c547768aafa79b7bf3f2362/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(302) 'TApi::mallocAsync( &memPtr, static_cast<std::size_t>(width) * sizeof(TElem), queue.getNativeHandle())' returned error : 'cudaErrorNotSupported': 'operation not supported'! ----- End Fatal Exception -------------------------------------------------
The text was updated successfully, but these errors were encountered:
assign heterogeneous
Sorry, something went wrong.
New categories assigned: heterogeneous
@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
cms-bot internal usage
A new Issue was created by @iarspider.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
Looks like the jobs are running on a 1/8th slice of an H100, with only 1 GB or GPU memory:
CUDA device 0: NVIDIA H100L-1-12C MIG 1g.12gb (sm_90)
Maybe that is not enough for the Phase-2 workflow with 4 concurrent streams ?
@rovere ?
No branches or pull requests
RelVals 29634.402, 29634.403, 29634.404, 29634.406, 29661.402, 29834.402, 29834.403, 29834.404 failed in CMSSW_15_0_GPU_X_2025-02-04-2300 with StdException:
The text was updated successfully, but these errors were encountered: