Skip to content

Conversation

@Leimeroth
Copy link
Member

@Leimeroth Leimeroth commented Dec 19, 2025

Properly close the lammps instance to allow multi step processes such as calphy to run on GPUs. Without properly closing the process hangs without throwing an error, so in the case of calphy it stopped silently after the averaging routine.
Complementary to ICAMS/calphy#205

Summary by CodeRabbit

  • Bug Fixes
    • Improved shutdown sequence to ensure proper resource cleanup and finalization of active jobs before terminating.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Walkthrough

The change adds an explicit finalization step in the shutdown method of the concurrent wrapper. When a job exists, self._job.finalize() is now called before closing and clearing the job reference to ensure proper cleanup.

Changes

Cohort / File(s) Change Summary
Job finalization in shutdown
src/pylammpsmpi/wrapper/concurrent.py
Added explicit call to self._job.finalize() in shutdown sequence prior to closing and clearing the job reference

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Single method call addition in a straightforward cleanup sequence
  • Clear intent: ensure finalization occurs before resource cleanup
  • Minimal scope: one file, one method modification

Poem

A rabbit hops through shutdown's way, 🐰
With finalize now leading the day,
Before close and clear, the job must rest,
A proper goodbye—cleanup at its best!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title is vague and lacks specificity about the actual fix, using the placeholder phrase 'issue with second lmp instance' without clearly explaining what problem is being addressed. Improve the title to be more specific about the fix, such as 'Properly finalize LAMMPS instance in shutdown to fix GPU workflow hanging' or similar, that clearly describes the problem and solution.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix--issue-with-second-lmp-instance

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d2d250 and 87fc291.

📒 Files selected for processing (1)
  • src/pylammpsmpi/wrapper/concurrent.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: unittest_openmpi (3.11)
  • GitHub Check: unittest_openmpi (3.12)
  • GitHub Check: unittest_mpich_mac
  • GitHub Check: unittest_openmpi (3.13)
  • GitHub Check: unittest_openmpi_mac
  • GitHub Check: unittest_old
  • GitHub Check: coverage
🔇 Additional comments (2)
src/pylammpsmpi/wrapper/concurrent.py (2)

39-43: The finalize() + close() sequence contradicts LAMMPS API documentation.

According to LAMMPS documentation, finalize() is intended as an alternative to close(), not a prerequisite: "Instead of lmp.close() it is also possible to call lmp.finalize(); this will destruct the LAMMPS instance, but also finalized and release the MPI and/or Kokkos environment if enabled and active." The current code calls both methods in sequence, which is not the documented usage pattern.

Instead of wrapping both in try-finally, consider calling finalize() alone (since finalize() already shuts down the MPI communication and Kokkos environment) or verify if both methods are required for pylammpsmpi-specific reasons. If both are intentional, that should be documented.

Likely an incorrect or invalid review comment.


39-43: No additional wrapper updates needed—the finalize() fix propagates through the wrapper hierarchy.

All other wrapper implementations already delegate to or inherit from LammpsConcurrent:

  • LammpsBase inherits from LammpsConcurrent without overriding close/shutdown methods, so it automatically uses the corrected cleanup path
  • LammpsLibrary wraps LammpsConcurrent and delegates its close() call directly to the wrapped instance
  • LammpsASELibrary conditionally wraps either a native LAMMPS instance (single-core, which uses standard library cleanup) or LammpsBase (multi-core, which inherits the fix)

The finalize() call added to ParallelLammps.shutdown() in LammpsConcurrent will be used by all multi-core code paths throughout the codebase. No additional updates are required in other wrapper implementations.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Leimeroth
Copy link
Member Author

Leimeroth commented Dec 19, 2025

Tests fail with a bunch of errors like this

*** The MPI_Comm_test_inter() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

Since all the PR does is properly closing lammps when calling close. I didn't check it, but I guess that the tests rely on keeping the MPI instance alive despite lammps being closed, which breaks the calphy process. So I assume the tests test something that should not work.
Mentioning @srmnitc so you know whats going on with GPU calphy calculations

Comment on lines +41 to 42
self._job.finalize()
self._job.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not the order be the other way around? First calling close() and then finalize(), at least this is how it is done in the LAMMPS examples: https://github.com/lammps/lammps/blob/develop/examples/mliap/mliap_pytorch_ACE.py

Copy link
Member

@jan-janssen jan-janssen Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still even after this change the tests do not pass #412 . It seems to me that this option is only required for specific LAMMPS models, primarily the https://docs.lammps.org/pair_mliap.html models, still I do not find any explanation in the documentation, rather the finalize() call seems to be only used in examples which use mliap models.

@jan-janssen jan-janssen marked this pull request as draft January 4, 2026 11:31
@jan-janssen
Copy link
Member

As the unit tests are currently failing, I marked the pull request as draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants