Skip to content

Conversation

@orionarcher
Copy link
Contributor

@orionarcher orionarcher commented Nov 23, 2025

Summary

This PR adds support for TorchSim, namely:

  • Utilities to convert the live TorchSim objects (e.g. TrajectoryReporter) into configurable schema with equivalent features
  • Input and output schema for integrate, optimize and static functions
  • Makers for integrate, optimize and static jobs

NOTE: this PR uses StrEnum's which are not supported in python 3.10, see #1334

Additional dependencies introduced (if any)

  • torchsim==0.4.1
  • python>=3.11

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request
title.

Before a pull request can be merged, the following items must be checked:

  • Code is in the standard Python style.
    The easiest way to handle this is to run the following in the correct sequence on
    your local machine. Start with running ruff and ruff format on your new code. This will
    automatically reformat your code to PEP8 conventions and fix many linting issues.
  • Doc strings have been added in the Numpy docstring format.
    Run ruff on your code.
  • Type annotations are highly encouraged. Run mypy to
    type check your code.
  • Tests have been added for any new functionality or bug fixes.
  • All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is highly
recommended that you use the pre-commit hook provided in the repository. Simply run
pre-commit install and a check will be run prior to allowing commits.

Comment on lines 238 to 269
if model_type == TSModelType.FAIRCHEMV1:
from torch_sim.models.fairchem_legacy import FairChemV1Model

return FairChemV1Model(model=model_path, **model_kwargs)
if model_type == TSModelType.FAIRCHEM:
from torch_sim.models.fairchem import FairChemModel

return FairChemModel(model=model_path, **model_kwargs)
if model_type == TSModelType.GRAPHPESWRAPPER:
from torch_sim.models.graphpes import GraphPESWrapper

return GraphPESWrapper(model=model_path, **model_kwargs)
if model_type == TSModelType.MACE:
from torch_sim.models.mace import MaceModel

return MaceModel(model=model_path, **model_kwargs)
if model_type == TSModelType.MATTERSIM:
from torch_sim.models.mattersim import MatterSimModel

return MatterSimModel(model=model_path, **model_kwargs)
if model_type == TSModelType.METATOMIC:
from torch_sim.models.metatomic import MetatomicModel

return MetatomicModel(model=model_path, **model_kwargs)
if model_type == TSModelType.NEQUIPFRAMEWORK:
from torch_sim.models.nequip_framework import NequIPFrameworkModel

return NequIPFrameworkModel(model=model_path, **model_kwargs)
if model_type == TSModelType.ORB:
from torch_sim.models.orb import OrbModel

return OrbModel(model=model_path, **model_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, can this block be refactored into a dict like:

import importlib

model_to_import_str = {
    "FAIRCHEMV1": "torch_sim.models.fairchem_legacy.FairChemV1Model",
    "FAIRCHEM": "torch_sim.models.fairchem.FairChemModel",
    ...
}

model_module, model_class = model_to_import_str[TSModelType[model_type]].rsplit(".",1)
return getattr(importlib.import_module(model_module),model_class)(model=model_path, **model_kwargs)

Comment on lines 219 to 223
all_properties: list[dict[str, np.ndarray]] = Field(
..., description="List of calculated properties for each structure."
)

model_config = ConfigDict(arbitrary_types_allowed=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the properties not be cast to a list or other built-in? arbitrary_types_allowed eliminates the benefits of type checking here

)

calcs_reversed: list[
TSIntegrateCalculation | TSOpimizeCalculation | TSStaticCalculation
Copy link
Collaborator

@esoteric-ephemera esoteric-ephemera Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine to have different calculation types, but I'd prefer to avoid making many schemas for small variations in the model fields - can you use the emmet.core.vasp.task_types to merge down these three into one TorchSimCalculation schema?

Avoiding union types like this lets us better support cloud native data formats (e.g., parquet) and most modern compression tools (again arrow/parquet) eliminate the storage penalty associated with nullable fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

pyproject.toml Outdated
"quippy-ase>=0.9.14; python_version < '3.12'",
"sevenn>=0.9.3",
"torchdata<=0.7.1", # TODO: remove when issue fixed
"torch_sim==0.4.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to its own optional import block since the module is currently distinct from the forcefields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@esoteric-ephemera esoteric-ephemera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments besides the ones on specific lines:

  • The TS prefixing in makers, enums, etc. is confusing given that TS is probably more familiar as "transition state" than "torchsim" - can you change this to TorchSim?
  • Are you envisioning this being added as a separate (as its currently implemented) or additive module to the existing forcefields stuff? If the latter, the schemas would have to be merged for the job outputs

@orionarcher
Copy link
Contributor Author

The TS prefixing in makers, enums, etc. is confusing given that TS is probably more familiar as "transition state" than "torchsim" - can you change this to TorchSim?

Good call.

Are you envisioning this being added as a separate (as its currently implemented) or additive module to the existing forcefields stuff? If the latter, the schemas would have to be merged for the job outputs

I think this should be a separate module. It would be really messy to integrate it with the existing forcefields stuff. TorchSim generally expects many -> many calculations (list[structure] -> list[structure]) and has different output files and such.

@JaGeo
Copy link
Member

JaGeo commented Dec 5, 2025

@orionarcher #1196 ?

@orionarcher
Copy link
Contributor Author

Encouraging! In that case, let me reframe. It looks like it would be possible, but I anticipate it would be a major headache, one I am reluctant to take on.

Though they both run MLIPs, ASE and TorchSim are different software packages with pretty different APIs. I don't see a major reason to have them share schema or logic. The forcefields module is hewn pretty closely to the ASE schema and embraces the paradigms of that package. While in principle I appreciate they are doing basically the same thing (take in strucuture -> repeatedly evaluate MLIP -> generate trajectory + final structure) so many of the norms and expectations of the software are different that I don't think integration would make either interface any better.

@JaGeo
Copy link
Member

JaGeo commented Dec 5, 2025

The motivation should always be the following:
If you have a similar schema people can replace their current code using forcefields easily with torchsim. This includes larger workflows.

@orionarcher
Copy link
Contributor Author

I hear you and I am sympathetic to that argument. In this case, I feel there is a tradeoff between the immediate adoptability of the TorchSim interface and it's overall quality. After looking back and forth at the ASE and TorchSim schema for the past 15 minutes, I don't think it's possible to adapt the TorchSim API to fit the ASE schemas without adding complexity, reducing readability, and making the overall API less natural and maintainable.

I would love for users currently using ASE to be able to quickly and reliably switch to TorchSim but it's not clear to me that equating the schemas is the best way to do that. I would be happy to write a transition guide outlining the schema differences and how to transition from ASE -> TorchSim and add it to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants