Skip to content

Beam Diagnostics is too Slow #1102

@ax3l

Description

@ax3l

The diagnostics code in reduced_beam_characteristics(pc) is too slow. In 1-MPI-rank simulations like the HTU beamline, when setting sim.particle_container().store_beam_moments = True, it is dominating the runtime by ~1.5x compared to the next costly element of the actual simulation.

TinyProfiler total time across processes [min...avg...max]: 0.02604 ... 0.02604 ... 0.02604

-------------------------------------------------------------------------------------------------------
Name                                                    NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
-------------------------------------------------------------------------------------------------------
impactx::diagnostics::reduced_beam_characteristics(pc)      91    0.01197    0.01197    0.01197  45.96%
impactx::Push::ChrQuad                                      34   0.007997   0.007997   0.007997  30.71%
impactx::Push::ExactDrift                                   33   0.001654   0.001654   0.001654   6.35%
impactx::Push::ExactSbend                                    5  0.0004234  0.0004234  0.0004234   1.63%
impactX::collect_lost_particles                             91  0.0003877  0.0003877  0.0003877   1.49%
ImpactX::evolve::slice_step                                 91  0.0003815  0.0003815  0.0003815   1.47%
ImpactX::add_particles                                       1  0.0003395  0.0003395  0.0003395   1.30%
impactx::Push::Kicker                                        8  0.0002024  0.0002024  0.0002024   0.78%
ImpactXParticleContainer::record_beam_moments               91  0.0001794  0.0001794  0.0001794   0.69%
DistributionMapping::LeastUsedCPUs()                         1  0.0001495  0.0001495  0.0001495   0.57%
ImpactX::track_particles                                     1   3.08e-05   3.08e-05   3.08e-05   0.12%
impactx::Push                                               91  1.807e-05  1.807e-05  1.807e-05   0.07%
AmrMesh::MakeDistributionMap()                               1  7.808e-06  7.808e-06  7.808e-06   0.03%
DistributionMapping::SFCProcessorMapDoIt()                   1  2.937e-06  2.937e-06  2.937e-06   0.01%
Other                                                      357  0.0001655  0.0001655  0.0001655   0.64%
-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------
Name                                                    NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
-------------------------------------------------------------------------------------------------------
ImpactX::track_particles                                     1    0.02335    0.02335    0.02335  89.69%
ImpactX::evolve::slice_step                                 91    0.02331    0.02331    0.02331  89.52%
ImpactXParticleContainer::record_beam_moments               91    0.01215    0.01215    0.01215  46.65%
impactx::diagnostics::reduced_beam_characteristics(pc)      91    0.01197    0.01197    0.01197  45.96%
impactx::Push                                               91     0.0103     0.0103     0.0103  39.56%
impactx::Push::ChrQuad                                      34   0.007999   0.007999   0.007999  30.72%
impactx::Push::ExactDrift                                   33   0.001656   0.001656   0.001656   6.36%
impactx::Push::ExactSbend                                    5  0.0004239  0.0004239  0.0004239   1.63%
ImpactX::add_particles                                       1  0.0003912  0.0003912  0.0003912   1.50%
impactX::collect_lost_particles                             91  0.0003877  0.0003877  0.0003877   1.49%
impactx::Push::Kicker                                        8   0.000203   0.000203   0.000203   0.78%
AmrMesh::MakeDistributionMap()                               1  0.0001608  0.0001608  0.0001608   0.62%
DistributionMapping::SFCProcessorMapDoIt()                   1   0.000153   0.000153   0.000153   0.59%
DistributionMapping::LeastUsedCPUs()                         1  0.0001495  0.0001495  0.0001495   0.57%
Other                                                      357  0.0001655  0.0001655  0.0001655   0.64%
-------------------------------------------------------------------------------------------------------

I think that amrex::ParticleReduce is OpenMP parallelized over particle tiles, but maybe it is not working or can be optimized?

Additionally can some operations be vectorized on CPU that are not auto-vectorized?

Or do we just calculate/reduce way too many variables (currently: two full-Np reductions with the 2nd one on 22 variables) and need to introduce a more fine-tuned approach, as we do for optionally calculating the (costly) eigenemittances?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions