- 
                Notifications
    
You must be signed in to change notification settings  - Fork 734
 
          Implementing gemmi-based mmcif reader (with easy extension to PDB/PDBx and mmJSON)
          #4712
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 101 commits
aa2a88f
              218cf43
              7f78e02
              6682d6e
              817f3a0
              3cc8c80
              f1bf325
              d21c220
              2a1be15
              77645e6
              91e6942
              9a0c086
              9c731df
              8b40ec7
              bdcbd73
              401a4d3
              9c336bd
              def88e4
              cabfd37
              4c9d930
              184491a
              3de8565
              ca6ebbb
              45077ad
              9a1a59a
              27c10d6
              950cfcf
              8d1a8b5
              ef29338
              caca17e
              f0e49cc
              b7ada7c
              e80632c
              10f3124
              98353fe
              35fa187
              e68fcce
              263e9f1
              ba47d53
              9ffb6f2
              fcfc6c0
              0de720e
              236b286
              b562115
              816b23f
              92ae164
              88c64a3
              59b7e29
              f2c23c8
              776676e
              71e60f4
              36b7125
              b058941
              ef30fa7
              95572c1
              a8a9436
              6706bbe
              8cf9da4
              f13156b
              dda981c
              ebdf849
              47043f6
              9770d7b
              fd7f70d
              1493056
              3d7fbb9
              9b9286e
              b8f3c04
              0f38a2d
              b915aab
              0d61248
              34d76ca
              b242aa5
              4fc3a78
              14fa756
              e3a9a1f
              d492b4e
              1880e4a
              927d7a0
              ad0f0be
              e03c3e5
              4d79205
              32d7cf9
              0df8c3a
              05c6ea1
              88dab79
              a82fe52
              e3f1714
              db46016
              32cd103
              805089e
              55c3dbb
              cd201d0
              81f0b5b
              22d1cca
              d1ba434
              a03b56f
              bd4c255
              3d61dc5
              aed9b54
              53c51f4
              3e0324c
              205c910
              f9f7912
              9a90316
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -21,8 +21,8 @@ inputs: | |||||
| default: 'codecov' | ||||||
| cython: | ||||||
| default: 'cython' | ||||||
| filelock: | ||||||
| default: 'filelock' | ||||||
| fasteners: | ||||||
| default: 'fasteners' | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add optional deps down in the optional deps section below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
        Suggested change
       
    
 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's on me not merging develop-2  | 
||||||
| griddataformats: | ||||||
| default: 'griddataformats' | ||||||
| gsd: | ||||||
| 
          
            
          
           | 
    @@ -60,6 +60,8 @@ inputs: | |||||
| default: 'dask' | ||||||
| distopia: | ||||||
| default: 'distopia>=0.4.0' | ||||||
| gemmi: | ||||||
| default: 'gemmi' | ||||||
| h5py: | ||||||
| default: 'h5py>=2.10' | ||||||
| hole2: | ||||||
| 
          
            
          
           | 
    @@ -130,6 +132,7 @@ runs: | |||||
| ${{ inputs.dask }} | ||||||
| ${{ inputs.distopia }} | ||||||
| ${{ inputs.gsd }} | ||||||
| ${{ inputs.gemmi }} | ||||||
| ${{ inputs.h5py }} | ||||||
| ${{ inputs.hole2 }} | ||||||
| ${{ inputs.joblib }} | ||||||
| 
          
            
          
           | 
    ||||||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| # -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
| # vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
| # | ||
| """ | ||
| MMCIF structure files in MDAnalysis --- :mod:`MDAnalysis.coordinates.MMCIF` | ||
| ========================================================================== | ||
| MDAnalysis reads coordinates from MMCIF (macromolecular Crystallographic Information File) files, also known as PDBx/mmCIF format, | ||
| using the ``gemmi`` library as a backend. MMCIF is a more modern and flexible alternative to the PDB format, | ||
| capable of storing detailed structural and experimental data about biological macromolecules. | ||
| MMCIF files use a structured, tabular format with key-value pairs to store both coordinate and atom information. | ||
| The format supports multiple models/frames, though this implementation currently only reads the first model | ||
| and provides warning messages for multi-model files. | ||
| Basic usage | ||
| ----------- | ||
| Reading an MMCIF file is straightforward: | ||
| .. code-block:: python | ||
| import MDAnalysis as mda | ||
| u = mda.Universe("structure.cif") | ||
| The reader will automatically detect if the structure contains placeholder unit cell information | ||
| (usually it's the case for cryoEM structures, and cell parameters are (1, 1, 1, 90, 90, 90)) | ||
| and set dimensions to None in that case. | ||
| Capabilities | ||
| ------------ | ||
| The MMCIF reader implementation uses the gemmi library to parse files and extract coordinates | ||
| and unit cell information. Currently only reading capability is supported, with the following | ||
| features: | ||
| - Single frame/model reading | ||
| - Unit cell dimensions detection | ||
| - Support for compressed .cif.gz files | ||
| - Automatic handling of placeholder unit cells for cryoEM structures | ||
| Examples | ||
| -------- | ||
| Basic structure loading:: | ||
| .. code-block:: python | ||
                
      
                  yuxuanzhuang marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| # Load structure from MMCIF | ||
| u = mda.Universe("structure.cif") | ||
| # or from cif.gz file | ||
| u = mda.Universe("structure.cif.gz") | ||
| Classes | ||
| ------- | ||
| .. autoclass:: MMCIFReader | ||
| :members: | ||
| :inherited-members: | ||
| See Also | ||
                
      
                  yuxuanzhuang marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| -------- | ||
| - wwPDB MMCIF Resources: <http://mmcif.wwpdb.org>_ | ||
| - Gemmi library documentation: <https://gemmi.readthedocs.io>_ | ||
| .. versionadded:: 2.9.0 | ||
| """ | ||
| 
     | 
||
| import logging | ||
| import warnings | ||
| 
     | 
||
| import numpy as np | ||
| 
     | 
||
| from . import base | ||
| 
     | 
||
| try: | ||
| import gemmi | ||
| 
     | 
||
| HAS_GEMMI = True | ||
| except ImportError: | ||
| HAS_GEMMI = False | ||
| 
     | 
||
| logger = logging.getLogger("MDAnalysis.coordinates.MMCIF") | ||
| 
     | 
||
| 
     | 
||
| def get_coordinates(model: "gemmi.Model") -> np.ndarray: | ||
| """Get coordinates of all atoms in the `gemmi.Model` object. | ||
| Parameters | ||
| ---------- | ||
| model | ||
| input `gemmi.Model`, e.g. `gemmi.read_structure('file.cif')[0]` | ||
| Returns | ||
| ------- | ||
| np.ndarray, shape [n, 3], where `n` is the number of atoms in the structure. | ||
| """ | ||
| return np.array( | ||
| [[*at.pos.tolist()] for chain in model for res in chain for at in res] | ||
| ) | ||
| 
     | 
||
| 
     | 
||
| class MMCIFReader(base.SingleFrameReaderBase): | ||
| """Reads from an MMCIF file using ``gemmi`` library as a backend. | ||
| Notes | ||
| ----- | ||
| If the structure represents an ensemble, only the first structure in the ensemble | ||
| is read here (and a warning is thrown). Also, if the structure has a placeholder "CRYST1" | ||
| record (1, 1, 1, 90, 90, 90), it's set to ``None`` instead. | ||
| .. versionadded:: 2.9.0 | ||
| """ | ||
| 
     | 
||
| format = ["cif", "cif.gz", "mmcif"] | ||
| units = {"time": None, "length": "Angstrom"} | ||
| 
     | 
||
| def _read_first_frame(self): | ||
| structure = gemmi.read_structure(self.filename) | ||
| cell_dims = np.array( | ||
| [ | ||
| getattr(structure.cell, name) | ||
| for name in ("a", "b", "c", "alpha", "beta", "gamma") | ||
| ] | ||
| ) | ||
| if len(structure) > 1: | ||
| warnings.warn( # FIXME: add tests for this | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #FIXME :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, you do have   | 
||
| f"File {self.filename} has {len(structure)=} models, but only the first one will be read" | ||
| ) | ||
| 
     | 
||
| model = structure[0] | ||
| coords = get_coordinates(model) | ||
| self.n_atoms = len(coords) | ||
| self.ts = self._Timestep.from_coordinates(coords, **self._ts_kwargs) | ||
| if np.allclose(cell_dims, np.array([1.0, 1.0, 1.0, 90.0, 90.0, 90.0])): | ||
| warnings.warn( | ||
| "1 A^3 CRYST1 record," | ||
| " this is usually a placeholder." | ||
| " Unit cell dimensions will be set to None." | ||
| ) | ||
| self.ts.dimensions = None | ||
| else: | ||
| self.ts.dimensions = cell_dims | ||
| self.ts.frame = 0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's on me not merging develop-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is not fixed yet?