Skip to content

feat: AdePT physics list plugin for e+/e-/gamma offload to GPU [WIP]#1606

Draft
wdconinc wants to merge 58 commits intoAIDASoft:masterfrom
wdconinc:adept
Draft

feat: AdePT physics list plugin for e+/e-/gamma offload to GPU [WIP]#1606
wdconinc wants to merge 58 commits intoAIDASoft:masterfrom
wdconinc:adept

Conversation

@wdconinc
Copy link
Copy Markdown
Contributor

@wdconinc wdconinc commented Apr 11, 2026

BEGINRELEASENOTES

  • feat: AdePT physics list plugin for e+/e-/gamma offload to GPU

ENDRELEASENOTES

This PR adds an AdePT physics list plugin for DD4hep (similar to the celeritas physics list plugin in https://github.com/celeritas-project/celeritas/).

Notes on AdePT integration approach:

  • AdePT is added as a Geant4AdePTPhysics action, and must be added with a helper function to setupUserPhysics in DDsim. This is added through the steering file.
  • The use of callUserTrackingAction=true is required, since we need a Geant4AdePTUserParticleHandler to 'repair' the track/particle after it comes back from the GPU to the CPU. This is also added by the steering file.
  • The rest of the example steering file only serves to make it self-consistent and runnable, but does not have AdePT functionality.

Notes on DD4hep core changes:

Copy link
Copy Markdown
Contributor

@MarkusFrankATcernch MarkusFrankATcernch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to make plugin headers public.
If necessary, we can put the header in a public directory and define the factory instance elsewhere....
There is a good reason to separate plugins from code.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 11, 2026

Test Results

   18 files     18 suites   6h 13m 49s ⏱️
  357 tests   354 ✅ 0 💤  3 ❌
3 143 runs  3 116 ✅ 0 💤 27 ❌

For more details on these failures, see this check.

Results for commit a70d61a.

♻️ This comment has been updated with latest results.

@wdconinc
Copy link
Copy Markdown
Contributor Author

We do not want to make plugin headers public.
If necessary, we can put the header in a public directory and define the factory instance elsewhere....
There is a good reason to separate plugins from code.

Yes, I noticed when filing the PR that the unrelated DDCore change snuck in. I'll remove it when I am back at a computer.

@andresailer
Copy link
Copy Markdown
Member

cc @SeverinDiederichs (FYI)

When AdePT's callUserTrackingAction=false (the default for performance),
GPU-produced hits and hadronic secondaries carry trackID/parentID=0 from
the dummy HostTrackData.  This caused two classes of errors:

1. 'No Equivalent particle for track:0' (from Geant4ParticleMap::particleID)
   GPU hits have trackID=0, and when Geant4Output2ROOT tries to remap them
   to final particle IDs, it calls particleID(0) which fails.
   Fix: in Geant4ParticleHandler::endEvent(), after rebaseSimulatedTracks,
   add m_equivalentTracks[0] pointing to the primary particle (g4id=1)
   so that all GPU hits with dummy trackID=0 are correctly attributed.

2. Hadronic secondaries returned from GPU with parentID=0 breaking the
   MC truth parent chain.
   Fix: Geant4AdePTUserParticleHandler::begin() remaps particle.g4Parent
   from 0 to the entering primary's G4 track ID.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@wdconinc
Copy link
Copy Markdown
Contributor Author

Together with apt-sim/AdePT#546 this now offloads tracks to my (very crappy) GPU.

ddsim --steeringFile DDG4/examples/AdePTSteeringFile.py --compactFile /opt/local/DDDetectors/compact/SiD.xml
image

wdconinc and others added 5 commits April 14, 2026 10:34
…ePTPhysics

LastNParticlesOnCPU: when the in-flight count drops below this threshold
the remaining particles are leaked back to Geant4/HepEm on CPU, terminating
the GPU transport loop early.  Setting this to a small value (e.g. 10-100)
avoids launching many near-empty kernels during the long shower tail.
Default 0 preserves the previous behaviour (always finish on GPU).

SpeedOfLight: debug/benchmark mode that kills all e-/e+/gamma immediately
without tracking them (equivalent to setting their mean free path to zero).
Useful for measuring geometry or non-EM overhead in isolation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document and expose the property in the example steering file with a
comment explaining its effect on GPU kernel launch efficiency.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In ddsim-mt (PR#1240), Geant4ParticleHandler is created inside
__setupGeneratorActions, which runs as a UserInitialization callback
during G4RunManager::Initialize() -- after setupPhysics has returned.
The previous approach of looking up the handler via
  kernel.generatorAction().get('ParticleHandler')
inside setup_physics no longer works: the object either doesn't exist
yet or its adopt() method is not available on the returned wrapper.

The FIXME in ParticleHandler.py noted that setupUserParticleHandler was
not extensible: it hardcoded only Geant4TCUserParticleHandler and
Geant4TVUserParticleHandler and called exit(1) for anything else.

Add an 'else' branch that supports arbitrary DDG4 action plugin class
names: create the action and call part.adopt(user) without any special
tracker-region configuration. This allows plugins such as
Geant4AdePTUserParticleHandler to be registered simply via:

  runner.part.userParticleHandler = "Geant4AdePTUserParticleHandler"

Update AdePTSteeringFile.py to use this clean mechanism in place of the
monkey-patch workaround introduced in the previous commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
wdconinc and others added 9 commits April 15, 2026 12:03
Co-authored-by: Juan Miguel Carceller <jmcarcell@users.noreply.github.com>
Co-authored-by: sss <sss@karma>
Halton sequences are low-discrepancy sequences that fill phase
space with faster variance reduction (1/N) than standard uniform
point picking with PRNGs (1/sqrt(N)). It's not cheating statistics
since you lose the Poisson statistical properties between two
consecutive events. This technique is often referred to as RQMC,
randomized quasi-Monte Carlo.

This adds scrambled Halton sequence support to the isotrope
generators (where inter-event statistics are not considered since
they don't represent real experimental running conditions). The
scrambling uses Cranley-Patterson rotation, which is sufficient
to remove correlations in three dimensional phase space sampling.

The sequences are scrambled with the random seed, so different
runs with different seeds will produce different sequences. This
also then allows statistical treatment to determine the errors
on aggregate quantities (see note in ddsim help).

The various distributions are modified to take a sampler function
that can either use PRNG or Halton sequences. For FFbar this is
not possible since it uses an accept/reject algorithm that only
works for PRNG.
wdconinc and others added 29 commits April 15, 2026 12:14
Add basic MT functionality tests with 1, 2, and 4 threads, file-based
generator tests (HepMC3, EDM4hep), and a comparison script framework for
validating ST vs MT equivalence.

Tests verify:
- MT mode runs without crashes
- Different thread counts (1, 2, 4) work correctly
- File-based input generators work in MT mode
- Backward compatibility with -j 1 (single-threaded)

Fix double-save bug in EDM4hep/LCIO/ROOT output for ST mode

In single-threaded mode, events were saved twice because
setupEDM4hepOutput/setupLCIOOutput/setupROOTOutput hardcoded shared=True.
Fixed by making the shared flag conditional on NumberOfThreads > 1.

Fix SIGSEGV crash in MT mode: make EventSeeder shared

setupEventSeeder() was called once per worker thread, creating multiple
EventSeeder instances with shared=False. During cleanup this caused
conflicts/double-free leading to SIGSEGV. Fixed by creating EventSeeder
with shared=True (one instance shared across all workers) and guarding
against duplicate creation.

Add tests for G4Gun and GPS with macroFile

These tests document that G4Gun and GPS with macroFile work in ST mode
but not in MT mode (macros execute during global init before worker
threads exist). Generator setup is guarded with numberOfThreads == 1.

fix: additional DDTest changes
Fixes heap corruption and SIGSEGV crashes when using ROOT output in
multi-threaded mode.

Root cause: Multiple Geant4 worker threads were accessing ROOT I/O
objects concurrently. ROOT's I/O system is not thread-safe by default,
causing heap corruption during multi-threaded writes that manifested
during exit in TFile::WriteStreamerInfo / TROOT::CloseFiles.

Changes:
1. Call ROOT.EnableThreadSafety() before any ROOT objects are created
   when numberOfThreads > 1 (MT mode).
2. Add static std::mutex s_rootMutex to Geant4Output2ROOT and protect
   all ROOT I/O operations with std::lock_guard:
   - commit(): TTree::Fill() and branch operations
   - closeOutput(): file Write() and Close()
   - beginRun(): file creation and opening
   - fill(): branch Fill() operations

The mutex ensures full serialization of ROOT I/O across all worker
threads, preventing concurrent access to TFile/TTree/TBranch objects
even with ROOT::EnableThreadSafety() in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
As a defense-in-depth measure alongside the primary fix in AdePT's
HostTrackDataMapper (cpuAncestorG4id propagation), force any track whose
G4 track ID is in the GPU-assigned range (>= INT_MAX/2, counting down from
INT_MAX) to have G4PARTICLE_ABOVE_ENERGY_THRESHOLD set in particle.reason.

This ensures such tracks enter the m_particleMap if-branch in
Geant4ParticleHandler::end() rather than the else-branch that walks the
parent chain and emits 'FATAL: No real particle parent present' when the
chain is broken by an unregistered GPU-assigned parent ID.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AdePT returns GPU-processed tracks to CPU, the returned tracks carry
GPU-assigned track IDs (counting down from INT_MAX).  When G4HepEm then
handles these tracks and produces hadronic secondaries (e.g. photo-nuclear
products W182/W183/W184/W186, neutrons, protons) inline -- before the GPU
track PostUserTrackingAction fires -- those secondaries have a GPU-range
parentID that is not yet registered in m_particleMap/m_equivalentTracks,
causing "FATAL: No real particle parent present" errors.

Fix in Geant4AdePTUserParticleHandler:
- In begin(): if track->GetParentID() is GPU-range, resolve g4Parent by
  looking up the parent in m_trackCache (populated when the GPU parent itself
  began).  This replaces the GPU parent ID with the CPU ancestor ID.
- In end() fallback: if track->GetParentID() is GPU-range, apply the same
  cache-based resolution rather than blindly using GetParentID(), which
  would undo the begin()-time fix for hadronic secondaries not in m_trackCache.
- In end(): GPU-assigned track IDs are forced into m_particleMap (if-branch)
  via G4PARTICLE_ABOVE_ENERGY_THRESHOLD so they are always registered.
- Update cache on end() instead of erasing, so a second end() call (for
  tracks that re-enter the GPU region) can still restore correct state.

Fix in Geant4ParticleHandler (core, minimal):
- Add cycle detection (std::set<int> visited) in the m_equivalentTracks
  walk in end() and rebaseSimulatedTracks() to prevent infinite loops if
  a self-referential entry is created by any remaining edge case.

Also update AdePTSteeringFile.py to use adequate slot sizes (10M) for
realistic simulation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@wdconinc
Copy link
Copy Markdown
Contributor Author

(Copilot got ahead of itself, merged ddsim-mt into this, and pushed it up. I'll revert this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants