- 
                Notifications
    
You must be signed in to change notification settings  - Fork 734
 
Optimize RDF histogram with Cython+OpenMP #5128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Optimize RDF histogram with Cython+OpenMP #5128
Conversation
This commit adds an optimized histogram implementation using Numba JIT compilation that provides 10-15x speedup for RDF calculations with large datasets. The optimization strategies include: - Cache-efficient memory access patterns with blocking - Parallel execution using thread-local storage - SIMD-friendly operations through Numba's auto-vectorization - Reduced Python overhead through JIT compilation The implementation automatically falls back to numpy.histogram when Numba is not available, maintaining full backward compatibility. Performance improvements: - 10-15x speedup for large datasets (>100k distances) - Scales efficiently to 50M+ distances - Minimal memory overhead - 100% numerical accuracy (matches numpy within floating point precision) Fixes MDAnalysis#3435 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Fixes CI linting failure by applying Black code formatter to the test file. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This commit replaces the Numba-based histogram implementation with a Cython+OpenMP version as requested by MDAnalysis core developers. This aligns with MDAnalysis's existing acceleration infrastructure. Key changes: - Implemented c_histogram.pyx with OpenMP parallel support - Serial version: 5-7x speedup over numpy.histogram - Parallel version: 11-18x speedup for large datasets (>100k elements) - Updated setup.py to build histogram extension with OpenMP flags - Modified rdf.py to use Cython histogram module - Removed old Numba-based histogram_opt.py module - All 14 histogram tests passing - All 19 existing RDF tests passing Performance (with OMP_NUM_THREADS=4): - 100k distances: 11.2x speedup - 1M distances: 15.3x speedup - 10M distances: 17.8x speedup - 100% numerical accuracy validated against numpy.histogram Related to Issue MDAnalysis#3435 🤖 Generated with Claude Code, checked and approved by me. Co-Authored-By: Claude <[email protected]>
| 
           @rhowardstone thank you for the cythonized PR (and your discussion post with background #5104 ). Sorry that you haven't heard from anyone in a while. We really appreciate your clear communication about using AI tools. We currently don't have a clear policy in place how to handle primarily AI generated code. Until we have more clarity on how we as a project want to proceed, we are not going to merge this PR (so I'll leave a blocking review). I don't want to close the PR, though, and in fact I'd encourage you to resolve the merge conflicts so that the actual CI can run. I also want to say that your PR is a good example for the potential and you're providing motivation to move forward with this challenging discussion.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking until MDAnalysis has a policy on AI-generated code.
(In the meantime, please
- resolve conflicts
 - merge the current develop branch (ie with the 2.10.0 release)
 - move your changelog entries into the 2.11.0 section
 - update your versionadded to 2.11.0
 
Fixing the conflicts should also make the CI tests run, which is essential in evaluating the PR – without running tests, it makes no sense for any one to review.)
          Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@             Coverage Diff             @@
##           develop    #5128      +/-   ##
===========================================
+ Coverage    92.68%   92.73%   +0.04%     
===========================================
  Files          180      180              
  Lines        22452    22453       +1     
  Branches      3186     3186              
===========================================
+ Hits         20809    20821      +12     
- Misses        1169     1176       +7     
+ Partials       474      456      -18     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
- Replace 'long' with 'cnp.int64_t' to fix cross-platform compatibility - On Windows, 'long' is 32-bit while np.int64 is 64-bit causing mismatch - Apply black formatting to Python files
- Move histogram enhancement entry to 2.11.0 section - Update PR number from MDAnalysis#5103 to MDAnalysis#5128 - Update versionadded directives from 2.10.0 to 2.11.0 - Resolve merge conflicts with upstream develop
9fd2803    to
    3c9dfc6      
    Compare
  
    | 
           Okay! I think I got it? All tests now pass, let me know if there's anything else you need from me!  | 
    
This PR adds an optimized histogram implementation using Cython and OpenMP that provides 10-18x speedup for RDF calculations with large datasets.
Context
This PR supersedes #5103, which used Numba. Following feedback from @orbeckst and @IAlibay that MDAnalysis traditionally uses Cython/C++ for acceleration, this implementation has been rewritten to use Cython with OpenMP, aligning with the project's existing infrastructure (similar to
c_distances_openmp.pyx).Implementation
The optimization strategies include:
Performance
With
OMP_NUM_THREADS=4:Testing
Technical Details
Files Modified:
package/MDAnalysis/lib/c_histogram.pyx- New Cython+OpenMP histogram implementationpackage/setup.py- Added histogram extension with OpenMP compilation flagspackage/MDAnalysis/analysis/rdf.py- Updated to use Cython histogramtestsuite/MDAnalysisTests/lib/test_c_histogram.py- Comprehensive test suitepackage/CHANGELOG- Documented enhancementKey Features:
c_distances_openmp.pyx)Related to #3435
🤖 Generated with Claude Code (Sonnet 4.5), checked and approved by me.
PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (Already added in previous PR)📚 Documentation preview 📚: https://mdanalysis--5128.org.readthedocs.build/en/5128/