Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug mode for alchemical trajectory analyses #231

Open
jmichel80 opened this issue Sep 13, 2024 · 1 comment
Open

debug mode for alchemical trajectory analyses #231

jmichel80 opened this issue Sep 13, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@jmichel80
Copy link
Contributor

Is your feature request related to a problem? Please describe.
A recurring issue seen with alchemical free energy calculations with SOMD2 is that occasionally trajectories terminate early due to a 'NaN' generated after an integration step. We have also seen cases of trajectories showing transient spikes in non-bonded energies that we would expect cause a numerical integration error.
Because of the stochastic nature and rare frequency of the issue it is difficult to isolate the source of the error.

Describe the solution you'd like
A 'debug' mode that enables buffering of coordinates and energies for the past N integration time-steps would be helpful. The code could be updated to write this information in molecular file formats to allow visualisation of the trajectory in the few steps immediately before a crash occurs.

Describe alternatives you've considered
This could be in principle implemented at the python API by adding extra logic to save/overwrite snapshots after every MD time-step. However this would likely be very slow and make it difficult to re-generate in a timely manner NaN crashes.

We could however buffer internally coordinates and forces and write them to disk only when a crash has been triggerred. There is already low-level logic in the code to attempt to deal with NaN errors by performing energy minimisation. Some compromise on speed (a few fold) would be acceptable for troubleshooting purposes.

@jmichel80 jmichel80 added the enhancement New feature or request label Sep 13, 2024
@chryswoods
Copy link
Contributor

This is doable, but would be extremely slow. Buffering the coordinates and energies for every integration timestep would require calculating the energy, plus transferring the coordinates from GPU to CPU memory every timestep. The buffering itself once calculated and transferred is easy. The first step would be to test to see how slow this would be by setting the trajectory and energy frequency to 1 timestep. This would simulate an infinite buffer. If the speed of this is acceptable, then the code change would be to add something to the trajectory object to tell it to act in a first in, last out cache mode. This is straightforward, as the trajectory object is already holding each individual frame in memory, so it would just have to drop the oldest frame once the buffer size is reached.

A similar thing could be done with the energy trajectory, but this isn't needed as much as it won't consume that much energy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants