Replies: 3 comments 10 replies
-
It looks like the |
Beta Was this translation helpful? Give feedback.
-
@HanatoK I made a quick edit about LAMMPS to your message at the top (which we could probably keep editing to keep the information organized, since you have a great starting point there?) Regarding the point on the atom groups: the main idea behind #655 (see also your comment here) was to allow sharing the atomic coordinate buffers between different Beyond that, the longer-term "plan" was less about improving the data structure of the atom groups, and more about trying to have the CVCs be more agnostic to the details of that data structure. This was the goal of the second point of #655, a good chunk of which you have also implemented in #788. Ideally, some of the member functions of the CVCs could become templates that are instantiated differently in each scenario (sequential, shared memory, domain decomposition). I had originally thought that major refactoring would be required for run all CVCs efficiently, but #783 shows that this is probably not be needed for every feature. At this point, it would definitely make sense to have separate implementations of |
Beta Was this translation helpful? Give feedback.
-
After some explorations and micro-benchmarks, I think it would be better to:
In addition, for the time being, it is difficult to make a base class for different implementations of |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a draft plan to continue #652 and #655.
colvarproxy
MD engines
Investigate how Colvars interoperates with GROMACS,
LAMMPSand Tinker-HP in case of the GPU-resident mode.LAMMPS support
LAMMPS uses GPUs primarily in two ways (see https://docs.lammps.org/Speed_packages.html):
GPU
package, which supports offload;KOKKOS
package, which is "GPU-resident" but also uses more abstract syntax; KOKKOS should be interoperable with the underlying languages (CUDA, HIP, SYCL, ...) but probably not all their specialized features.GPU buffers
atoms_masses
,atoms_charge
,atoms_positions
,atoms_total_forces
andatoms_new_colvar_forces
to the subclasses ofcolvarproxy
, and allocate device memory if a subclass ofcolvarproxy
supports the GPU-resident mode.Stream/Queue management
colvarproxy_gpu
class to create, synchronize and delete the streams (CUDA and HIP) or queues (SYCL).colvarmodule
smp gpu
.cvm::atom_group
GPU buffers
atoms_pos
,atoms_charge
,atoms_vel
,atoms_mass
,atoms_grad
,atoms_total_force
andatoms_weight
on device memory;read_positions
,read_velocities
andread_total_forces
on GPU.GPU kernels of atom-group calculations
Basically we need to implement everything in
calc_required_properties
with GPU kernels:calc_center_of_mass
on GPU;calc_center_of_geometry
on GPU;calc_apply_roto_translation
on GPU;colvarmodule::rotation
on GPU;calc_optimal_rotation_soa
on GPU.Question: should we have a separate
cvm::atom_group_base
for the CPU and GPU implementations?colvar::cvc
GPU kernels for CVCs
calc_value_gpu
andcalc_gradients_gpu
for all CVCs;smp gpu
is used, thencalc_value_gpu
andcalc_gradients_gpu
will be called.Tests
run_colvars_test.cpp
on GPU;colvarproxy_stub_gpu
on GPU;Beta Was this translation helpful? Give feedback.
All reactions