Skip to content

Commit 3c83b8f

Browse files
pstaerkPhilipp Stärkhejamuhmacdope
authored
Extend lammpsdump to accept arbitrary columns (#3608)
* Extended mdanalysis to accept other attributes as well. * Able to parse arbitrary columns now. * Tried to fix most of the pep8 problems. * First try at testing the additional column part. * Testing multi read columns as well. * Fix the no additional columns case. * Implemented requested changes to docs. * Implemented the requested changes to the tests. * Incorporated the PEP8 comments. * Third round of PEP... * PEP8 ... * Authors and changelog. * Variable renaming issue. * Hopefully fixed documentation. * Sphinx * Hopefully fixed the tests * Implement input from UGM23 * refine tests * Small typo Co-authored-by: Hugo MacDermott-Opeskin <[email protected]> * Removed comment * Addressed hmacdope's comments regarding issue link * Addressed hmacdope's comments regarding file paths. * Added warning if keys are not in lammpsdump file. * Tested the formatting error of additional_columns. * Make changed lines comply with pep8 * 80 is indeed longer than 79... * Fix test * Fix test * Don't format `datafiles.py` * Added test of warning. --------- Co-authored-by: Philipp Stärk <[email protected]> Co-authored-by: hejamu <[email protected]> Co-authored-by: Hugo MacDermott-Opeskin <[email protected]>
1 parent 2c1aa4b commit 3c83b8f

File tree

8 files changed

+229
-14
lines changed

8 files changed

+229
-14
lines changed

Diff for: package/AUTHORS

+1
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@ Chronological list of authors
235235
- Jenna M. Swarthout Goddard
236236
2024
237237
- Aditya Keshari
238+
- Philipp Stärk
238239

239240
External code
240241
-------------

Diff for: package/CHANGELOG

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ The rules for this file:
1515

1616
-------------------------------------------------------------------------------
1717
??/??/?? IAlibay, HeetVekariya, marinegor, lilyminium, RMeli,
18-
ljwoods2, aditya292002
18+
ljwoods2, aditya292002, pstaerk
1919

2020
* 2.8.0
2121

@@ -30,6 +30,7 @@ Fixes
3030
* Fix deploy action to use the correct version of the pypi upload action.
3131

3232
Enhancements
33+
* Added parsing of arbitrary columns of the LAMMPS dump parser. (Issue #3504)
3334
* Documented the r0 attribute in the `Contacts` class and added the
3435
`n_initial_contacts` attribute, with documentation. (Issue #2604, PR #4415)
3536

Diff for: package/MDAnalysis/coordinates/LAMMPS.py

+94-8
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@
135135
from ..topology.LAMMPSParser import DATAParser
136136
from ..exceptions import NoDataError
137137
from . import base
138+
import warnings
138139

139140
btype_sections = {'bond':'Bonds', 'angle':'Angles',
140141
'dihedral':'Dihedrals', 'improper':'Impropers'}
@@ -458,12 +459,50 @@ class DumpReader(base.ReaderBase):
458459
"""Reads the default `LAMMPS dump format
459460
<https://docs.lammps.org/dump.html>`__
460461
461-
Supports coordinates in various LAMMPS coordinate conventions and both
462-
orthogonal and triclinic simulation box dimensions (for more details see
463-
`documentation <https://docs.lammps.org/Howto_triclinic.html>`__). In
464-
either case, MDAnalysis will always use ``(*A*, *B*, *C*, *alpha*, *beta*,
465-
*gamma*)`` to represent the unit cell. Lengths *A*, *B*, *C* are in the
466-
MDAnalysis length unit (Å), and angles are in degrees.
462+
Supports coordinates in the LAMMPS "unscaled" (x,y,z), "scaled" (xs,ys,zs),
463+
"unwrapped" (xu,yu,zu) and "scaled_unwrapped" (xsu,ysu,zsu) coordinate
464+
conventions (see https://docs.lammps.org/dump.html for more details).
465+
If `lammps_coordinate_convention='auto'` (default),
466+
one will be guessed. Guessing checks whether the coordinates fit each
467+
convention in the order "unscaled", "scaled", "unwrapped",
468+
"scaled_unwrapped" and whichever set of coordinates is detected first will
469+
be used. If coordinates are given in the scaled coordinate convention
470+
(xs,ys,zs) or scaled unwrapped coordinate convention (xsu,ysu,zsu) they
471+
will automatically be converted from their scaled/fractional representation
472+
to their real values.
473+
474+
Supports both orthogonal and triclinic simulation box dimensions (for more
475+
details see https://docs.lammps.org/Howto_triclinic.html). In either case,
476+
MDAnalysis will always use ``(*A*, *B*, *C*, *alpha*, *beta*, *gamma*)``
477+
to represent the unit cell. Lengths *A*, *B*, *C* are in the MDAnalysis
478+
length unit (Å), and angles are in degrees.
479+
480+
By using the keyword `additional_columns`, you can specify arbitrary data
481+
to be read. The keyword expects a list of the names of the columns or
482+
`True` to read all additional columns. The results are saved to
483+
:attr:`Timestep.data`. For example, if your LAMMPS dump looks like this
484+
485+
.. code-block::
486+
487+
ITEM: ATOMS id x y z q l
488+
1 2.84 8.17 -25 0.00258855 1.1
489+
2 7.1 8.17 -25 6.91952e-05 1.2
490+
491+
Then you may parse the additional columns `q` and `l` via:
492+
493+
.. code-block:: python
494+
495+
u = mda.Universe('structure.data', 'traj.lammpsdump',
496+
additional_columns=['q', 'l'])
497+
498+
The additional data is then available for each time step via:
499+
500+
.. code-block:: python
501+
502+
for ts in u.trajectory:
503+
charges = ts.data['q'] # Access additional data, sorted by the id
504+
ls = ts.data['l']
505+
...
467506
468507
Parameters
469508
----------
@@ -497,6 +536,9 @@ class DumpReader(base.ReaderBase):
497536
**kwargs
498537
Other keyword arguments used in :class:`~MDAnalysis.coordinates.base.ReaderBase`
499538
539+
.. versionchanged:: 2.7.0
540+
Reading of arbitrary, additional columns is now supported.
541+
(Issue #3608)
500542
.. versionchanged:: 2.4.0
501543
Now imports velocities and forces, translates the box to the origin,
502544
and optionally unwraps trajectories with image flags upon loading.
@@ -510,18 +552,23 @@ class DumpReader(base.ReaderBase):
510552
format = 'LAMMPSDUMP'
511553
_conventions = ["auto", "unscaled", "scaled", "unwrapped",
512554
"scaled_unwrapped"]
555+
513556
_coordtype_column_names = {
514557
"unscaled": ["x", "y", "z"],
515558
"scaled": ["xs", "ys", "zs"],
516559
"unwrapped": ["xu", "yu", "zu"],
517560
"scaled_unwrapped": ["xsu", "ysu", "zsu"]
518561
}
519562

563+
_parsable_columns = ["id", "vx", "vy", "vz", "fx", "fy", "fz"]
564+
for key in _coordtype_column_names.keys():
565+
_parsable_columns += _coordtype_column_names[key]
566+
520567
@store_init_arguments
521-
def __init__(self, filename,
568+
def __init__(self, filename,
522569
lammps_coordinate_convention="auto",
523570
unwrap_images=False,
524-
**kwargs):
571+
additional_columns=None, **kwargs):
525572
super(DumpReader, self).__init__(filename, **kwargs)
526573

527574
root, ext = os.path.splitext(self.filename)
@@ -536,6 +583,16 @@ def __init__(self, filename,
536583

537584
self._unwrap = unwrap_images
538585

586+
if (util.iterable(additional_columns)
587+
or additional_columns is None
588+
or additional_columns is True):
589+
self._additional_columns = additional_columns
590+
else:
591+
raise ValueError(f"additional_columns={additional_columns} "
592+
"is not a valid option. Please provide an "
593+
"iterable containing the additional"
594+
"column headers.")
595+
539596
self._cache = {}
540597

541598
self._reopen()
@@ -681,6 +738,25 @@ def _read_next_timestep(self):
681738
coord_cols.extend(image_cols)
682739

683740
ids = "id" in attr_to_col_ix
741+
742+
# Create the data arrays for additional attributes which will be saved
743+
# under ts.data
744+
if self._additional_columns is True:
745+
# Parse every column that is not already parsed
746+
# elsewhere (total \ parsable)
747+
additional_keys = set(attrs).difference(self._parsable_columns)
748+
elif self._additional_columns:
749+
if not all([key in attrs for key in self._additional_columns]):
750+
warnings.warn("Some of the additional columns are not present "
751+
"in the file, they will be ignored")
752+
additional_keys = \
753+
[key for key in self._additional_columns if key in attrs]
754+
else:
755+
additional_keys = []
756+
for key in additional_keys:
757+
ts.data[key] = np.empty(self.n_atoms)
758+
759+
# Parse all the atoms
684760
for i in range(self.n_atoms):
685761
fields = f.readline().split()
686762
if ids:
@@ -701,12 +777,22 @@ def _read_next_timestep(self):
701777
if self._has_forces:
702778
ts.forces[i] = [fields[dim] for dim in force_cols]
703779

780+
# Collect additional cols
781+
for attribute_key in additional_keys:
782+
ts.data[attribute_key][i] = \
783+
fields[attr_to_col_ix[attribute_key]]
784+
704785
order = np.argsort(indices)
705786
ts.positions = ts.positions[order]
706787
if self._has_vels:
707788
ts.velocities = ts.velocities[order]
708789
if self._has_forces:
709790
ts.forces = ts.forces[order]
791+
792+
# Also need to sort the additional keys
793+
for attribute_key in additional_keys:
794+
ts.data[attribute_key] = ts.data[attribute_key][order]
795+
710796
if (self.lammps_coordinate_convention.startswith("scaled")):
711797
# if coordinates are given in scaled format, undo that
712798
ts.positions = distances.transform_StoR(ts.positions,

Diff for: testsuite/MDAnalysisTests/coordinates/reference.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,10 @@
2525
from MDAnalysisTests import datafiles
2626
from MDAnalysisTests.datafiles import (PDB_small, PDB, LAMMPSdata,
2727
LAMMPSdata2, LAMMPSdcd2,
28-
LAMMPSdata_mini, PSF_TRICLINIC,
29-
DCD_TRICLINIC, PSF_NAMD_TRICLINIC,
30-
DCD_NAMD_TRICLINIC)
28+
LAMMPSdata_mini,
29+
LAMMPSdata_additional_columns,
30+
PSF_TRICLINIC, DCD_TRICLINIC,
31+
PSF_NAMD_TRICLINIC, DCD_NAMD_TRICLINIC)
3132

3233

3334
class RefAdKSmall(object):
@@ -227,3 +228,9 @@ class RefLAMMPSDataMini(object):
227228
dtype=np.float32)
228229
dimensions = np.array([60., 50., 30., 90., 90., 90.], dtype=np.float32)
229230

231+
232+
class RefLAMMPSDataAdditionalColumns(object):
233+
q = np.array([2.58855e-03, 6.91952e-05, 1.05548e-02, 4.20319e-03,
234+
9.19172e-03, 4.79777e-03, 6.36864e-04, 5.87125e-03,
235+
-2.18125e-03, 6.88910e-03])
236+
p = np.array(5 * [1.1, 1.2])

Diff for: testsuite/MDAnalysisTests/coordinates/test_lammps.py

+70-2
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,18 @@
2929
import MDAnalysis as mda
3030
from MDAnalysis import NoDataError
3131

32-
from numpy.testing import (assert_equal, assert_allclose, assert_allclose)
32+
from numpy.testing import (assert_equal, assert_allclose)
3333

3434
from MDAnalysisTests import make_Universe
3535
from MDAnalysisTests.coordinates.reference import (
3636
RefLAMMPSData, RefLAMMPSDataMini, RefLAMMPSDataDCD,
37+
RefLAMMPSDataAdditionalColumns
3738
)
3839
from MDAnalysisTests.datafiles import (
3940
LAMMPScnt, LAMMPShyd, LAMMPSdata, LAMMPSdata_mini, LAMMPSdata_triclinic,
4041
LAMMPSDUMP, LAMMPSDUMP_allcoords, LAMMPSDUMP_nocoords, LAMMPSDUMP_triclinic,
41-
LAMMPSDUMP_image_vf, LAMMPS_image_vf
42+
LAMMPSDUMP_image_vf, LAMMPS_image_vf, LAMMPSdata_additional_columns,
43+
LAMMPSDUMP_additional_columns
4244
)
4345

4446

@@ -511,6 +513,46 @@ def u(self, tmpdir, request):
511513
yield mda.Universe(f, format='LAMMPSDUMP',
512514
lammps_coordinate_convention="auto")
513515

516+
@pytest.fixture()
517+
def u_additional_columns_true(self):
518+
f = LAMMPSDUMP_additional_columns
519+
top = LAMMPSdata_additional_columns
520+
return mda.Universe(top, f, format='LAMMPSDUMP',
521+
lammps_coordinate_convention="auto",
522+
additional_columns=True)
523+
524+
@pytest.fixture()
525+
def u_additional_columns_single(self):
526+
f = LAMMPSDUMP_additional_columns
527+
top = LAMMPSdata_additional_columns
528+
return mda.Universe(top, f, format='LAMMPSDUMP',
529+
lammps_coordinate_convention="auto",
530+
additional_columns=['q'])
531+
532+
@pytest.fixture()
533+
def u_additional_columns_multiple(self):
534+
f = LAMMPSDUMP_additional_columns
535+
top = LAMMPSdata_additional_columns
536+
return mda.Universe(top, f, format='LAMMPSDUMP',
537+
lammps_coordinate_convention="auto",
538+
additional_columns=['q', 'p'])
539+
540+
@pytest.fixture()
541+
def u_additional_columns_wrong_format(self):
542+
f = LAMMPSDUMP_additional_columns
543+
top = LAMMPSdata_additional_columns
544+
return mda.Universe(top, f, format='LAMMPSDUMP',
545+
lammps_coordinate_convention="auto",
546+
additional_columns='q')
547+
548+
@pytest.fixture()
549+
def u_additional_columns_not_present(self):
550+
f = LAMMPSDUMP_additional_columns
551+
top = LAMMPSdata_additional_columns
552+
return mda.Universe(top, f, format='LAMMPSDUMP',
553+
lammps_coordinate_convention="auto",
554+
additional_columns=['q', 'w'])
555+
514556
@pytest.fixture()
515557
def reference_positions(self):
516558
# manually copied from traj file
@@ -592,6 +634,32 @@ def test_atom_reordering(self, u, reference_positions):
592634
assert_allclose(atom1.position, atom1_pos-bmin, atol=1e-5)
593635
assert_allclose(atom13.position, atom13_pos-bmin, atol=1e-5)
594636

637+
@pytest.mark.parametrize("system, fields", [
638+
('u_additional_columns_true', ['q', 'p']),
639+
('u_additional_columns_single', ['q']),
640+
('u_additional_columns_multiple', ['q', 'p']),
641+
])
642+
def test_additional_columns(self, system, fields, request):
643+
u = request.getfixturevalue(system)
644+
for field in fields:
645+
data = u.trajectory[0].data[field]
646+
assert_allclose(data,
647+
getattr(RefLAMMPSDataAdditionalColumns, field))
648+
649+
@pytest.mark.parametrize("system", [
650+
('u_additional_columns_wrong_format'),
651+
])
652+
def test_wrong_format_additional_colums(self, system, request):
653+
with pytest.raises(ValueError,
654+
match="Please provide an iterable containing"):
655+
request.getfixturevalue(system)
656+
657+
@pytest.mark.parametrize("system", [
658+
('u_additional_columns_not_present'),
659+
])
660+
def test_warning(self, system, request):
661+
with pytest.warns(match="Some of the additional"):
662+
request.getfixturevalue(system)
595663

596664
@pytest.mark.parametrize("convention",
597665
["unscaled", "unwrapped", "scaled_unwrapped"])
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
LAMMPS data file via write_data, version 24 Mar 2022, timestep = 500
2+
3+
10 atoms
4+
1 atom types
5+
6+
0 42.6 xlo xhi
7+
0 44.2712 ylo yhi
8+
-25.1 25.1 zlo zhi
9+
10+
Masses
11+
12+
1 12.011
13+
14+
Pair Coeffs # lj/cut/coul/long/omp
15+
16+
1 0.0663 3.5812
17+
18+
Atoms # full
19+
20+
1 2 1 -0.00706800004577013 2.84 8.17 -25 0 0 0
21+
2 2 1 0.004078816788554217 7.1 8.17 -25 0 0 0
22+
3 2 1 -0.005824512619752745 2.13 6.94 -25 0 0 0
23+
4 2 1 0.002812345167059992 6.39 6.94 -25 0 0 0
24+
5 2 1 -0.004070019151543417 2.84 5.71 -25 0 0 0
25+
6 2 1 0.004796116641855679 7.1 5.71 -25 0 0 0
26+
7 2 1 -0.003217742434809291 2.13 4.48 -25 0 0 0
27+
8 2 1 0.0008273956785370801 6.39 4.48 -25 0 0 0
28+
9 2 1 -0.0003942558636157474 2.84 3.25 -25 0 0 0
29+
10 2 1 0.001288716009147968 7.1 3.25 -25 0 0 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
ITEM: TIMESTEP
2+
0
3+
ITEM: NUMBER OF ATOMS
4+
10
5+
ITEM: BOX BOUNDS pp pp ff
6+
0.0000000000000000e+00 4.2600000000000001e+01
7+
0.0000000000000000e+00 4.4271200000000000e+01
8+
-2.5100000000000001e+01 2.5100000000000001e+01
9+
ITEM: ATOMS id x y z q p
10+
1 2.84 8.17 -25 0.00258855 1.1
11+
2 7.1 8.17 -25 6.91952e-05 1.2
12+
3 2.13 6.94 -25 0.0105548 1.1
13+
4 6.39 6.94 -25 0.00420319 1.2
14+
5 2.84 5.71 -25 0.00919172 1.1
15+
6 7.1 5.71 -25 0.00479777 1.2
16+
7 2.13 4.48 -25 0.000636864 1.1
17+
8 6.39 4.48 -25 0.00587125 1.2
18+
9 2.84 3.25 -25 -0.00218125 1.1
19+
10 7.1 3.25 -25 0.0068891 1.2

Diff for: testsuite/MDAnalysisTests/datafiles.py

+4
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@
156156
"LAMMPSdata_deletedatoms", # with deleted atoms
157157
"LAMMPSdata_triclinic", # lammpsdata file to test triclinic dimension parsing, albite with most atoms deleted
158158
"LAMMPSdata_PairIJ", # lammps datafile with a PairIJ Coeffs section
159+
"LAMMPSdata_additional_columns", # structure for the additional column lammpstrj
159160
"LAMMPSDUMP",
160161
"LAMMPSDUMP_long", # lammpsdump file with a few zeros sprinkled in the first column first frame
161162
"LAMMPSDUMP_allcoords", # lammpsdump file with all coordinate conventions (x,xs,xu,xsu) present, from LAMMPS rdf example
@@ -166,6 +167,7 @@
166167
"LAMMPSDUMP_chain1", # Lammps dump file with chain reader
167168
"LAMMPSDUMP_chain2", # Lammps dump file with chain reader
168169
"LAMMPS_chain", # Lammps data file with chain reader
170+
"LAMMPSDUMP_additional_columns", # lammpsdump file with additional data (an additional charge column)
169171
"unordered_res", # pdb file with resids non sequential
170172
"GMS_ASYMOPT", # GAMESS C1 optimization
171173
"GMS_SYMOPT", # GAMESS D4h optimization
@@ -552,6 +554,8 @@
552554
LAMMPSDUMP_chain2 = (_data_ref / "lammps/chain_dump_2.lammpstrj").as_posix()
553555
LAMMPS_chain = (_data_ref / "lammps/chain_initial.data").as_posix()
554556
LAMMPSdata_many_bonds = (_data_ref / "lammps/a_lot_of_bond_types.data").as_posix()
557+
LAMMPSdata_additional_columns = (_data_ref / "lammps/additional_columns.data").as_posix()
558+
LAMMPSDUMP_additional_columns = (_data_ref / "lammps/additional_columns.lammpstrj").as_posix()
555559

556560
unordered_res = (_data_ref / "unordered_res.pdb").as_posix()
557561

0 commit comments

Comments
 (0)