Skip to content

Commit 505bc02

Browse files
committed
begin updating docs to new API
1 parent 364ef6d commit 505bc02

File tree

4 files changed

+205
-71
lines changed

4 files changed

+205
-71
lines changed

docs/data.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
## representing a single atomic structure
2+
3+
A key component of any molecular simulation engine is the ability to represent a collection of atoms in 3D space,
4+
also referred to as an atomic geometry.
5+
It refers to a list of atoms, whereby each atom is characterized by its position in space, its chemical identity, and, if available, a force which acts on the atom.
6+
In addition, atomic geometries can also contain metadata such as information on their periodicity (unit cell vectors), the potential energy, a stress tensor, or the value of a particular order parameter.
7+
8+
In psiflow, atomic geometries are represented using the `Geometry` class. It is essentially a concise equivalent to ASE's `Atoms` class. They can be created from an XYZ string, from an existing ASE atoms instance, or directly with raw arrays of positions, atomic numbers, and optionally the periodicity.
9+
```py
10+
import ase
11+
import numpy as np
12+
from psiflow.geometry import Geometry
13+
14+
15+
# a simple H2 molecule in vacuum
16+
geometry = Geometry.from_string('''
17+
2
18+
H 0.0 0.0 0.0
19+
H 0.0 0.0 0.8
20+
''')
21+
22+
# the same H2 molecule using ase Atoms
23+
atoms = ase.Atoms(
24+
numbers=[1, 1, 1],
25+
positions=[[0, 0, 0], [0, 0, 0.8]],
26+
pbc=False,
27+
)
28+
geometry = Geometry.from_atoms(atoms) # creates an identical instance
29+
30+
print(len(geometry)) # prints the number of atoms, in this case 2
31+
assert geometry.pbc == False # if no cell info is given, the instance is assumed to be non-periodic
32+
geometry.cell[:] = 10 * np.eye(3) # set the cell vectors to a 10 A x 10 A x 10 A cube
33+
assert geometry.pbc == True # now the instance is periodic
34+
35+
36+
print(geometry.energy) # None; no energy has been set
37+
print(np.all(np.isnan(geometry.forces))) # True; no forces have been set
38+
39+
40+
# the same instance, directly from numpy
41+
geometry = Geometry.from_data(
42+
positions=np.array([[0, 0, 0], [0, 0, 0.8]]),
43+
numbers=np.array([1, 1]),
44+
cell=None,
45+
)
46+
47+
```
48+
49+
All features in psiflow are fully compatible with either nonperiodic (molecular) or 3D periodic systems with arbitrary unit cells (i.e. general triclinic). However, for efficiency reasons, i-PI (along with OpenMM, GROMACS, and a bunch of other packages) typically require that atomic geometries are represented in their *canonical* orientation.
50+
In the canonical orientation, the cell vectors are aligned with the X, Y, and Z axes as much as possible.
51+
In addition, box vectors are added and subtracted in order to make the cell as orthorhombic as possible.
52+
Note that the relative orientation of atoms with respect to each other as well as the volume of the unit cell remain exactly the same, so this transformation does not affect the physical behavior of the system in any way.
53+
Since psiflow relies on i-PI for sampling, we adhere to the same convention.
54+
55+
```py
56+
geometry = Geometry.from_data(
57+
positions=np.array([[0, 0, 0], [0, 0, 0.8]]),
58+
numbers=np.array([1, 1]),
59+
cell=np.array([[4, 0, 0], [0, 4, 0], [3, 3, 6]]),
60+
)
61+
geometry.canonical_orientation() # first and second vector are subtracted from the third
62+
print(geometry.cell) # the cell vectors are mostly aligned with the axes
63+
```
64+
Check out the API reference for a full overview of its functionality.
65+
66+
67+
## representing multiple structures
68+
69+
In many cases, it is necessary to represent a collection of atomic configurations, for example, a trajectory of snapshots generated by molecular dynamics, or a dataset of atomic configurations used for model training.
70+
In psiflow, such collections are represented using the `Dataset` class.
71+
72+
Importantly, because psiflow supports large-scale asynchronous execution, the `Dataset` instance does not actually store the atomic configurations themselves, but rather just maintains a reference to an XYZ file which is stored on disk.
73+
As such, `Dataset` can represent data which is currently already available (i.e. written in the XYZ file) or *data that will be generated and saved in the future*.
74+
75+
!!! note "Parsl 101: Apps and Futures"
76+
To understand what is meant by 'generating data in the future', it is necessary
77+
to introduce the core concepts in Parsl: apps and futures. In their simplest
78+
form, apps are just functions, and futures are the result of an app given
79+
a set of inputs. Importantly, a Future already exists before the actual calculation
80+
is performed. In essence, a Future _promises_ that, at some time in the future, it will
81+
contain the actual result of the function evaluation. Take a look at the following
82+
example:
83+
84+
```py
85+
from parsl.app.app import python_app
86+
87+
88+
@python_app # convert a regular Python function into a Parsl app
89+
def sum_integers(a, b):
90+
return a + b
91+
92+
93+
sum_future = sum_integers(3, 4) # tell Parsl to generate a future that represents the sum of integers 3 and 4
94+
print(sum_future) # is an AppFuture, not an integer
95+
96+
print(sum_future.result()) # now compute the actual result; this will print 7 !
97+
98+
```
99+
The return value of Parsl apps is not the actual result (in this case, an integer), but
100+
an AppFuture that will store the result of the function evaluation after it has completed.
101+
The main reason for doing things this way is that this allows for asynchronous execution.
102+
For more information, check out the [Parsl documentation](https://parsl.readthedocs.io/en/stable/).
103+
104+
Practically speaking, `Dataset` instances behave like a regular list of geometries except that they return Parsl futures:
105+
106+
```py
107+
from psiflow.data import Dataset
108+
109+
data = Dataset.load('trajectory.xyz') # some trajectory data generated before
110+
111+
data[4] # AppFuture representing the `Geometry` instance at index 4
112+
data.length() # AppFuture representing the length of the dataset
113+
114+
```
115+
As shown in the example, you can still index the dataset and ask for its length as you would normally do when working directly with a Python list.
116+
The difference is that `Dataset` does not actually return the values (since they might not be available yet), but rather returns Parsl futures that represent the values.
117+
As a user, you can still interact with the dataset as if it were a regular list, but you will need to call the `.result()` method to get the actual values -- see the [Parsl documentation](https://parsl.readthedocs.io/en/stable/) for more information.
118+
119+
```py
120+
print(data[4].result()) # actual `Geometry` instance at index 4
121+
print(data.length().result()) # actual length of the dataset, i.e. the number of states in `train.xyz`
122+
```
123+
124+
Datasets support all the operations that you would expect from a regular list, such as slicing, appending, or concatenating. In addition, they provide a number of convenience methods for common operations, such as filtering, shuffling, or splitting the dataset into training and validation sets.
125+
```py
126+
train, valid = data.split(0.9, shuffle=True) # do a randomized 90/10 split
127+
128+
energies = train.get('energy') # get the energies of all geometries in the training set
129+
print(energies.result().shape) # energies is an AppFuture, so we need to call .result()
130+
# (n,)
131+
132+
forces = train.get('forces') # get the forces of all geometries in the training set
133+
print(forces.result().shape) # forces is an AppFuture, so we need to call .result()
134+
# (n, 3)
135+
```

docs/hamiltonian.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical particles that are subject to *effective* interactions which are determined by the quantum mechanical behavior of the electrons.
2+
In addition to the atomic interactions, it is often useful to define additional biasing forces on the system, e.g. in order to drive a rare event or to prevent the system from exploring undesired regions in phase space.
3+
In addition, there exist various alchemical free energy techniques which rely on systematic changes in the hamiltonian ( = potential energy) of the system to derive free energy differences between different states.
4+
5+
To accomodate for all these use cases, psiflow provides a simple abstraction for *a function which accepts an atomic geometry and returns energies and forces*: the `Hamiltonian` class.
6+
The simplest hamiltonian (which is only really useful for testing purposes) is the Einstein crystal, which binds atoms using harmonic springs to a certain reference position.
7+
```py
8+
9+
geometry = Geometry.from_string('''
10+
2
11+
H 0.0 0.0 0.0
12+
H 0.0 0.0 0.8
13+
''')
14+
15+
hamiltonian = EinsteinCrystal(
16+
reference_geometry=geometry.positions,
17+
force_constant=0.1,
18+
)
19+
20+
```

docs/index.md

+39-65
Original file line numberDiff line numberDiff line change
@@ -3,79 +3,55 @@ hide:
33
- toc
44
---
55

6-
# **psiflow** - interatomic potentials using online learning
6+
# **psiflow** - scalable molecular simulation
77

8-
Psiflow is a modular and scalable library for developing interatomic potentials.
9-
It interfaces popular trainable interaction potentials with
10-
quantum chemistry software and is designed as an end-to-end framework;
11-
it can orchestrate all computational tasks between an initial atomic structure and
12-
the final accurate interatomic potential.
13-
To achieve this, psiflow implements the following high-level abstractions:
148

15-
- a trainable **interaction potential** (e.g. NequIP or MACE)
16-
- one or more **phase space sampling** algorithms (e.g. biased NPT, geometry optimization)
17-
- a reference **level of theory** (e.g. CP2K using PBE-D3(BJ) + TZVP)
9+
Psiflow is a scalable molecular simulation engine for chemistry and materials science applications.
10+
It supports:
1811

12+
- **quantum mechanical calculations** at various levels of theory (GGA and hybrid DFT, post-HF methods such as MP2 or RPA, and even coupled cluster; using CP2K|GPAW|ORCA)
1913

20-
These three components are used to implement **online learning** algorithms,
21-
which essentially interleave phase space sampling with
22-
quantum mechanical energy evaluations and model training.
23-
In this way, the entire (relevant part of the) phase space of the system(s)
24-
of interest may be explored and learned by the model without ever having to
25-
perform *ab initio* molecular dynamics.
14+
- **trainable interaction potentials** as well as easy-to-use universal potentials, e.g. [MACE-MP0](https://arxiv.org/abs/2401.00096)
15+
- a wide range of **sampling algorithms**: NVE|NVT|NPT, path-integral molecular dynamics, alchemical replica exchange, metadynamics, phonon-based sampling, ... (thanks to [i-PI](https://ipi-code.org/))
2616

27-
!!! success "**See what it looks like on [Weights & Biases](https://wandb.ai/svandenhaute/formic_acid?workspace=user-svandenhaute)!**"
28-
29-
The main channel through which you will analyze psiflow's output is Weights & Biases.
30-
Click [here](https://wandb.ai/svandenhaute/formic_acid?workspace=user-svandenhaute)
31-
to check out a few completed runs, in which we learn the energetics of the molecular
32-
proton transfer reaction in a formic acid dimer!
33-
For more information on the example as well as a full walkthrough on how to obtain
34-
the reaction free energy based on a single input structure as starting point, check out the
35-
Jupyter [notebook](https://github.com/molmod/psiflow/blob/main/examples/notebook/tutorial.ipynb).
17+
Users may define arbitrarily complex workflows and execute them **automatically** on local, HPC, and/or cloud infrastructure.
18+
To achieve this, psiflow is built using [Parsl](https://parsl-project.org/): a parallel execution library which manages job submission and workload distribution.
19+
As such, psiflow can orchestrate large molecular simulation pipelines on hundreds or even thousands of nodes.
3620

3721
---
3822

39-
<!---
40-
## Core functionality
41-
42-
The psiflow abstractions for a reference level of theory (`BaseReference`),
43-
a trainable potential (`BaseModel`), and an ensemble of phase space walkers
44-
(`Ensemble`, `BaseWalker`) are subclassed by specific implementations.
45-
They expose the main high-level functionalities that one would intuitively
46-
expect: A `BaseReference` can label a dataset with QM energy and forces according
47-
to some level of theory, after which a `BaseModel` instance can be trained to it.
48-
An `Ensemble` can use that `BaseModel` to explore the phase space of the systems
49-
of interest (e.g. using molecular dynamics) in order to generate new
50-
atomic configurations, which can again be labeled using `BaseReference` etc.
51-
--->
23+
Use the following one-liner to create a lightweight [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) Python environment with all dependencies readily available:
24+
```sh
25+
curl -L molmod.github.io/psiflow/install.sh | bash
26+
```
27+
The environment can be activated by sourcing the `activate.sh` file which will be created in the current working directory.
5228

53-
__Scalable execution__
54-
55-
Psiflow workflows can be executed on large HPCs and/or cloud computing infrastructure.
56-
The individual QM calculations, model training runs,
57-
and/or phase space sampling calculations are efficiently executed on
58-
hundreds or thousands of nodes thanks to
59-
[Parsl, a parallel and asynchronous execution framework](https://parsl-project.org/).
60-
For example, you could distribute all CP2K calculations to a local SLURM cluster,
61-
perform model training on a GPU from a Google Cloud instance, and forward
62-
the remaining phase space sampling and data processing operations to a single
63-
workstation in your local network.
64-
Naturally, Parsl tracks the dependencies between all objects and manages execution of the workflow
65-
in an asynchronous manner.
66-
<!---
67-
Psiflow centralizes all execution-level configuration options using an `ExecutionContext`.
68-
It forwards infrastructure-specific options within Parsl, such as the requested number of nodes
69-
per SLURM job or the specific Google Cloud instance to be use, to training,
70-
sampling, and QM evaluation operations to ensure they proceed as requested.
71-
Effectively, the `ExecutionContext` hides all details of the execution
72-
infrastructure and exposes simple and platform-agnostic resources which may be
73-
used by training, sampling, and QM evaluation apps.
74-
As such, we ensure that execution-side details are strictly separated from
75-
the definition of the computational graph itself.
76-
For more information, check out the psiflow [Configuration](config.md) page.
77-
--->
29+
Next, create a `config.yaml` file which defines the compute resources. For SLURM-based HPC systems, psiflow can initialize your configuration automatically via the following command:
30+
```sh
31+
python -c 'import psiflow; psiflow.setup_slurm()'
32+
```
33+
Example configuration files for [LUMI](https://lumi-supercomputer.eu/), [MeluXina](https://luxembourg.public.lu/en/invest/innovation/meluxina-supercomputer.html), or [VSC](https://www.vscentrum.be/) can be found [here](https://github.com/molmod/psiflow/tree/main/configs).
34+
No additional software compilation is required since all of the heavy lifting (CP2K/ORCA/GPAW, PyTorch model training, i-PI dynamics) is executed within preconfigured [Apptainer](https://apptainer.org/)/[Singularity](https://sylabs.io/singularity/) containers which are production-ready for most HPCs.
35+
36+
For a complete overview of all execution options, see the [configuration](configuration.md) page.
37+
38+
# Examples
7839

40+
- [Replica exchange molecular dynamics](https://github.com/molmod/psiflow/tree/main/examples/alanine_replica_exchange.py) | **alanine dipeptide**: replica exchange molecular dynamics simulation of alanine dipeptide, using the MACE-MP0 universal potential.
41+
The inclusion of high-temperature replicas allows for fast conformational transitions and improves ergodicity.
42+
- [Geometry optimizations](https://github.com/molmod/psiflow/tree/main/examples/formic_acid_transition.py) | **formic acid dimer**: approximate transition state calculation for the proton exchange reaction in a formic acid dimer,
43+
using simple bias potentials and a few geometry optimizations.
44+
- [Static and dynamic frequency analysis](https://github.com/molmod/psiflow/tree/main/examples/h2_static_dynamic.py) | **dihydrogen**: Hessian-based estimate of the H-H bond strength and corresponding IR absorption frequency, and a comparison with a dynamical estimate from NVE simulation and Fourier analysis.
45+
46+
- [Bulk modulus calculation](https://github.com/molmod/psiflow/tree/main/examples/iron_bulk_modulus.py) | **iron**: estimate of the bulk modulus of fcc iron using a series of NPT simulations at different pressures
47+
48+
- [Solid-state phase stabilities](https://github.com/molmod/psiflow/tree/main/examples/iron_harmonic_fcc_bcc.py) | **iron**: estimating the relative stability of fcc and bcc iron with anharmonic corrections using thermodynamic integration (see e.g. [Phys Rev B., 2018](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.97.054102))
49+
50+
- [DFT singlepoints](https://github.com/molmod/psiflow/tree/main/examples/water_cp2k_noise.py) | **water**: analysis of the numerical noise DFT energy and force evaluations using CP2K and the RPBE(D3) functional, for a collection of water molecules.
51+
52+
- [Path-integral molecular dynamics](https://github.com/molmod/psiflow/examples/water_path_integral_md.py) | **water**: demonstration of the impact of nuclear quantum effects on the variance in O-H distance in liquid water. Path-integral molecular dynamics simulations with increasing number of beads (1, 2, 4, 8, 16) approximate the proton delocalization, and lead to systematically larger variance in O-H distance.
53+
54+
- [Machine learning potential training](https://github.com/molmod/psiflow/examples/water_train_validate.py) | **water**: simple training and validation script for MACE on a small dataset of water configurations.
7955

8056
!!! note "Citing psiflow"
8157

@@ -90,8 +66,6 @@ For more information, check out the psiflow [Configuration](config.md) page.
9066
__9__, 19 __(2023)__
9167

9268

93-
---
94-
9569

9670
<!---
9771
- __atomic data__: the `Dataset` class represents a list of atomic configurations.

mkdocs.yml

+11-6
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ theme:
77
# text: overpass
88
palette:
99
primary: teal
10-
accent: teal
10+
accent: yellow
11+
scheme: slate
1112
logo: icon.svg
1213
features:
1314
- content.code.copy
@@ -16,16 +17,20 @@ theme:
1617
#- navigation.tabs
1718
#- navigation.tabs.sticky
1819
- navigation.indexes
19-
#- navigation.sections
20+
- navigation.sections
2021
- navigation.expand
2122
- toc.integrate
2223
- toc.follow
2324

2425
nav:
25-
- Introduction: index.md
26-
- Learning algorithms: learning.md
27-
- Installation: installation.md
28-
- Execution: execution.md
26+
- overview: index.md
27+
- atomic geometries: data.md
28+
- hamiltonians: hamiltonian.md
29+
- sampling: sampling.md
30+
- QM calculations: reference.md
31+
- ML potentials: model.md
32+
- online learning: learning.md
33+
- configuration: configuration.md
2934

3035
plugins:
3136
- mkdocstrings:

0 commit comments

Comments
 (0)