MatPES (Materials Potential Energy Surface) is a potential energy surface dataset with near-complete coverage of the periodic table, designed to train foundation potentials (FPs), i.e., machine learning interatomic potentials (MLIPs) for materials. MatPES is an initiative by the Materialyze.AI lab and the Materials Project to address critical deficiencies in existing PES datasets.
| Version | Date | Description | Download |
|---|---|---|---|
| 2025.2 | 15 Apr 2026 | Addition of Bader and DDEC6 charges; removed a small number of duplicated structures. | PBE, r2SCAN |
| 2025.1 | 6 Mar 2025 | Initial release (~400k structures) | PBE, r2SCAN |
| - | 6 Mar 2025 | Atomic reference energies | PBE, r2SCAN |
- Accuracy. MatPES is computed using static DFT calculations with stringent convergence criteria. Please refer
to the
MatPESStaticSetin pymatgen for details. - Comprehensiveness. MatPES structures are sampled using a 2-stage version of DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling from a greatly expanded configuration of MD structures.
- Quality. MatPES includes computed data from the PBE functional, as well as the high fidelity r2SCAN meta-GGA functional with improved description across diverse bonding and chemistries.
MatPES 2025.2 is the latest public release; it extends the initial 2025.1 release (~400,000 structures from 300 K MD simulations) with Bader and DDEC6 atomic charges and removal of duplicated structures. The dataset remains much smaller than other PES datasets in the literature and yet achieves comparable or, in some cases, improved performance and reliability on trained FPs.
MatPES is part of the MatML ecosystem, which also includes MatGL (Materials Graph Library), maml (MAterials Machine Learning), and MatCalc (Materials Calculator).
The MatPES dataset is available on Hugging Face. You can use
the datasets package to download it:
from datasets import load_dataset
load_dataset("materialyze/matpes", "pbe")
load_dataset("materialyze/matpes", "r2scan")Without any version specifiers, the latest version of each dataset will be returned.
To download a specific version, append a -<version> specifier. For example:
load_dataset("materialyze/matpes", "r2scan-2025.2")The matpes python package, which provides tools for working with the MatPES datasets, can be installed via pip:
pip install matpesSome command line usage examples:
# Download the PBE dataset to the current directory.
# You should see a MatPES-PBE-2025.2.json file in your directory.
matpes download pbe
# Extract all entries in the Fe-O chemical system.
matpes data -i MatPES-PBE-2025.2.json --chemsys Fe-O -o Fe-O.json.gzThe matpes.db module provides functionality to create your own MongoDB database with the downloaded MatPES data,
which is extremely useful if you plan to work with the data (e.g., querying, adding entries, etc.) extensively.
We have released a set of MatPES-trained foundation potentials (FPs) in the M3GNet, CHGNet, and TensorNet architectures in the MatGL package. For example, you can load the TensorNet FP trained on MatPES PBE 2025.2 as follows:
import matgl
potential = matgl.load_model("TensorNet-PES-MatPES-PBE-2025.2")Model names follow the format <architecture>-PES-<dataset>-<dataset-version>.
These FPs can be used easily with the MatCalc package to rapidly compute properties. For example:
from matcalc.elasticity import ElasticityCalc
calculator = ElasticityCalc("TensorNet-PES-MatPES-PBE-2025.2")
calculator.calc(structure)We have provided Jupyter notebooks demonstrating how to load the MatPES dataset, train a model, and perform fine-tuning.
If you use the MatPES dataset, please cite the following work:
Kaplan, A. D.; Liu, R.; Qi, J.; Ko, T. W.; Deng, B.; Riebesell, J.; Ceder, G.; Persson, K. A.; Ong, S. P. A
Foundational Potential Energy Surface Dataset for Materials. arXiv 2025. DOI: 10.48550/arXiv.2503.04070.In addition, if you use any of the pre-trained FPs or architectures, please cite the references provided on the architecture used as well as MatGL.