Bottlenecks Running Allegro in LAMMPS #44

tawe141 · 2024-05-31T01:52:04Z

tawe141
May 31, 2024

Hi Alby, Anders, and co.!

I've been playing around with Allegro and LAMMPS. I can get away with a relatively small simulation (~500 atoms) but would like to have this run for a long time scale. Because of that, I'd like to optimize the performance per timestep. I'm using a fairly small Allegro model (1 layer, SO(3) symmetry, small # of tensor features, etc.). Looking at the CPU and GPU utilization for 1 MPI rank and 1 V100 GPU, I'm seeing 100% CPU usage and 75% GPU usage. Moving to 2 MPI ranks and 2 GPUs (1 node), I'm seeing 100% CPU usage per rank and 66% GPU usage per GPU. It seems it's currently bottlenecked by something on the CPU.

I did a bit of profiling, and it looks like a significant chunk of total runtime (30%) is spent on LAMMPS_NS::CommKokkos::borders() here, with about 20% of total runtime spent on LAMMPS_NS::CommKokkos::borders_device<Kokkos::Cuda>(). It looks like this function transfers neighbor data between procs, but the odd thing is, I'm running this with only 1 MPI rank. Maybe this is just sending the neighbor list to the GPU? I can provide the gprof output if you'd like a closer look.

Have you encountered something similar before?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bottlenecks Running Allegro in LAMMPS #44

{{title}}

Replies: 0 comments

Select a reply

Bottlenecks Running Allegro in LAMMPS #44

tawe141 May 31, 2024

Replies: 0 comments

tawe141
May 31, 2024