Skip to content

Commit

Permalink
Merge pull request #17 from MobleyLab/reference
Browse files Browse the repository at this point in the history
Add main challenge reference calculations input files
  • Loading branch information
davidlmobley authored Nov 28, 2017
2 parents 9be5023 + c792e29 commit ec5d3d0
Show file tree
Hide file tree
Showing 353 changed files with 4,935,864 additions and 52 deletions.
25 changes: 3 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,26 +116,7 @@ In some cases, other effects like the presence of small amount of water in cyclo


### SAMPLing challenge
The purpose of the SAMPLing challenge component is to evaluate and compare the performance of different sampling methodologies in the context of free energy calculations of biomolecular systems; we will be running extremely long calculations with the provided input files in an attempt to obtain "gold standard" results, and then assess how well different methods/approaches converge to these results.
The challenge consists in computing the free energy of binding of three host-guest systems taken from the main SAMPL6 challenge: CB8-G3 (quinine), OA-G3 (5-hexenoic acid), and OA-G6 (4-methylpentanoic acid).
Force field parameters, and ideally long-range treatment, should be the same for all participants to allow a more objective comparison of sampling methods.
For this purpose, equilibrated system files that include topologies and initial configurations are provided in [`host_guest/SAMPLing/`](host_guest/SAMPLing) in various formats (i.e., Amber, Gromacs, OpenMM, PDB).
Five different initial configurations are provided for each host-guest systems (see [`host_guest/README.md`](host_guest/README.md#sampling-challenge-files) for the setup protocol).

To participate in the challenge, you will have to submit the following information for each of the 5 replicates of the 3 host-guest systems:
- Binding free energy predictions using 1%, 2%, 3%, ..., 100% of the sequential data (i.e., _not_ bootstrapped).
- Integrated autocorrelation time of the _reduced_ potential energies of the bound thermodynamic state after the first 10ns of the simulation. The `pymbar` Python package exposes a function `statisticalInefficiency()` in its `timeseries` module which can be used for this task..
- Description of the thermodynamic cycle, in particular the number of thermodynamic states (e.g. lambda/umbrella sampling windows).
- Total computer time, total wall clock time, total number of energy evaluations, and hardware used to perform the simulations.
The file format for these will be made available in the near future.

For relative free energy methods, only the five replicates of the transformation OA-G3 to OA-G6 are required. An atom map is provided in JSON format in [`host_guest/SAMPLing/`](host_guest/SAMPLing).

The reference absolute free energy calculations will be performed using YANK and the following methods/parameters:
- Hamiltonian Replica-Exchange and Langevin dynamics (BAOAB splitting) with the temperature set to 298.15K.
- A Monte Carlo barostat set at 1atm
- The OpenMM's implementation of PME for long-range electrostatic interactions with a cutoff of 10A.
- VdW interactions used the same 10A cutoff and a switching distance of 9A.

This SAMPLing challenge is a bit of an experiment, as it is entirely possible that different methods/packages may *not* agree even when apparently converged, requiring participating groups to work together to track down discrepancies. However, if agreement is obtained, it should be very instructive to compare rate of convergence.
The purpose of the SAMPLing challenge component is to evaluate and compare the performance of different sampling methodologies in the context of free energy calculations of biomolecular systems. Participants are invited to compute the free energy of binding of few host-guest systems taken from the main SAMPL6 challenge. We will be running extremely long calculations with the provided input files in an attempt to obtain "gold standard" results, and then assess how well different methods approach/converge to these results. See [`SAMPLing_instructions.md`](SAMPLing_instructions.md) for more details.

This SAMPLing challenge is a bit of an experiment, as it is entirely possible that different methods/packages may *not* agree even when apparently converged, requiring participating groups to work together to track down discrepancies. However, if agreement is obtained, it should be very instructive to compare the rate of convergence.
We expect that analysis of this challenge component will focus even more than usual on "lessons learned" rather than on which methods performed "best" by some metric, but we hope it will also pave the way for future iterations of such challenges.
69 changes: 69 additions & 0 deletions SAMPLing_instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# SAMPLing Challenge Instructions

## Challenge overview

The purpose of the SAMPLing challenge component is to evaluate and compare the performance of different sampling methodologies in the context of free energy calculations of biomolecular systems. Participants are invited to compute the free energy of binding of few host-guest systems taken from the main SAMPL6 challenge. We will be running extremely long calculations with the provided input files in an attempt to obtain "gold standard" results, and then assess how well different methods approach/converge to these results.

Force field parameters, and ideally treatment of long-range interactions, should be identical for all participants to allow a more objective comparison of the sampling methods. For this purpose, equilibrated system files that include topologies and initial configurations are provided in [`host_guest/SAMPLing/`](host_guest/SAMPLing) in various formats (i.e., Amber, Gromacs, OpenMM, PDB). Five different initial configurations are given for each host-guest system. See section [Files description](#files-description) for more details about the input files and the setup protocol.

The specific instructions are slightly different for absolute and relative free energy methods.

### Absolute free energy methods

The challenge consists in computing the standard free energy of binding of three host-guest systems:
- CB8-G3 (quinine),
- OA-G3 (5-hexenoic acid), and
- OA-G6 (4-methylpentanoic acid).

A total of 15 free energy calculations have to be performed, starting from the 5 different initial configurations provided for each host-guest system.

### Relative free energy methods

The challenge consists in computing the relative binding free energy of the transformation OA-G3 (5-hexenoic acid) to OA-G6 (4-methylpentanoic acid). The specific transformation is described by the atom map provided in JSON format with the input files (see section [Files description](#files-description)). A total of 5 free energy calculations have to be performed, starting from the 5 initial configurations provided for OA-G3.

## Data submission
For each calculation (15/5 for absolute/relative free energy methods), you will have to submit the following information:
- Binding free energy estimates using 1%, 2%, 3%, ..., 100% of the sequential data (i.e., _not_ bootstrapped).
- Description of the thermodynamic cycle, in particular the number of thermodynamic states (e.g. lambda/umbrella sampling windows).
- Total CPU time, total wall clock time, total number of energy evaluations, and hardware used to perform the simulations.

The file format for these will be made available in the near future.

## Reference calculations

The reference absolute free energy calculations will be performed using YANK and the following methods/parameters:
- Hamiltonian Replica-Exchange and Langevin dynamics (BAOAB splitting) with the temperature set to 298.15K.
- A Monte Carlo barostat set at 1atm
- The OpenMM's implementation of PME for long-range electrostatic interactions with a cutoff of 10A.
- VdW interactions used the same 10A cutoff and a switching distance of 9A.

Further details will be provided in the near future.

## Files description

Equilibrated systems are provided for OA-G3 (5-hexenoic acid), OA-G6 (4-methylpentanoic acid) and CB8-G3 (quinine), and they are located at [`host_guest/SAMPLing/`](host_guest/SAMPLing). Five different initial configurations are given for each system. The files are available in Amber (`prmtop`/`rst7`), Gromacs (`top`/`gro`), OpenMM (`xml`) and PDB formats. Each sub-folder `HOST-GUEST-X/`, where `X` is a digit labeling one of the 5 initial configurations, contains solvated system files for both the host-guest complex (e.g. `complex.prmtop`, `complex.gro`) and the guest alone (e.g. `solvent.prmtop`, `solvent.gro`).

The `host_guest/SAMPLing/` folder includes also an atom map in JSON format that has to be used for relative free energy calculations. The ligand atoms of OA-G3 that match the ligand atoms in OA-G6 are given for the systems in complex and in solvent. The file has the following format:
```json
"complex":
"unique_atoms_G3": [184, 185, 187, 192, 193, 194, 195, 196]
"unique_atoms_G6": [185, 186, 189, 192, 193, 194, 195, 196, 197, 202]
"atom_map_G3_to_G6":
"197": 198
"198": 199
"199": 201
...
"solvent":
...
```
where `unique_atoms_G3` is a list of atom indices that do not match any G6 atom, and `atom_map_G3_to_G6` maps atoms of G3 to those of G6 by atom index. _All indices are 0-based_. This map can be used with any of the 5 replicates of `OA-G3-X` and `OA-G6-X`.

### Files preparation
All the host-guest system files in the `SAMPLing/` directory were prepared using the protocol below.
- We used the most likely protonation states as predicted by Epik `4.0013` from the Schrodinger toolkit at experimental pH. These are identical to those given in the `mol2` files in `host_guest/OctaAcidsAndGuests/` and `host_guest/CB8AndGuests/`.
- 5 docked complexes were generated with OpenEye `2017.6.1`.
- Hosts and guests were both parametrized with GAFF v1.8 and antechamber. AM1-BCC charges were generated using OpenEye's QUACPAC toolkit through `openmoltools 0.8.1`.
- The systems were solvated in a 12A buffer of TIP3P water molecules using tleap. ParmEd `2.7.3` was used to remove some of the water molecules from the OA complexes to reduce them to have the same number of waters.
- The systems' net charge was neutralized with Na+ and Cl- ions. More Na+ and Cl- ions were added to reach the ionic strength of 60mM for OA/TEMOA systems and 150mM for CB8 to simulate the effect of the 10mM and 25mM sodium phosphate buffer used in their respective experiments.
- The system was minimized with the L-BFGS optimization algorithm and equilibrated by running 1ns of Langevin dynamics (BAOAB splitting, 1fs time step) at 298.15K with a Monte Carlo barostat set at 1atm using `OpenMM 7.1.1`. PME was used for long-range electrostatic interactions with a cutoff of 10A. VdW interactions used the same 10A cutoff and a switching distance of 9A.
- After the equilibration, the `System` was serialized into the OpenMM `xml` format. The `rst7` file was generated during the equilibration using the `RestartReporter` in the `parmed.openmm` module. The AMBER `prmtop` and `rst7` files were then converted to GROMACS `top`/`gro` and PDB formats by ParmEd and MDTraj `1.9.1` respectively.
32 changes: 2 additions & 30 deletions host_guest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,5 @@
- `OctaAcidsAndGuests/`: Directory containing octa acid (OA and TEMOA) input structures and guest structure files
- `CB8AndGuests/`: Directory containing CB8 input structure and guest structure files
- `GenerateInputs.ipynb`: Jupyter notebook using the OpenEye toolkits to generate molecule structure files from the other inputs noted above. NOTE: This is provided for informational purposes; the output files are already available here so it is unnecessary for you to use this notebook.
- `SAMPLing/`: Equilibrated system files for the SAMPLing challenge (see below for further details).

## SAMPLing challenge files

Equilibrated systems are provided for OA-G3 (5-hexenoic acid), OA-G6 (4-methylpentanoic acid) and CB8-G3 (quinine). 5 different initial configurations are given for each complex. Files are available in Amber (`prmtop`/`rst7`), Gromacs (`top`/`gro`), OpenMM (`xml`) and PDB formats. Solvated system files are available both for the host-guest complex (e.g. `complex`) and the guest alone (e.g. `solvent`).

The `SAMPLing/` folder includes an atom map in JSON format for relative free energy calculations. The ligand atoms of OA-G3 that match the ligand atoms in OA-G6 are given for the systems in complex and in solvent. The file has this format
```json
"complex":
"unique_atoms_G3": [184, 185, 187, 192, 193, 194, 195, 196]
"unique_atoms_G6": [185, 186, 189, 192, 193, 194, 195, 196, 197, 202]
"atom_map_G3_to_G6":
"197": 198
"198": 199
"199": 201
...
"solvent":
...
```
where `unique_atoms_G3` is a list of atom indices that do not match any G6 atom, and `atom_map_G3_to_G6` matches atoms of G3 to those of G6 by atom index. All indices are 0-based. This map can be used with any of the 5 replicates `OA-G3-X` and `OA-G6-X`.

### Preparation
All the host-guest system files in the `SAMPLing/` directory were prepared using the protocol below.
- Protonation states and initial starting configurations were taken as given by the original `mol2` files in the `OctaAcidsAndGuests/` and `CB8AndGuests/` directories.
- 5 docked complexes were generated with OpenEye `2017.6.1`.
- Hosts and guests were both parametrized with GAFF v1.8 and antechamber. AM1-BCC charges were generated using OpenEye's QUACPAC toolkit through `openmoltools 0.8.1`.
- The systems were solvated in a 12A buffer of TIP3P water molecules using tleap. ParmEd `2.7.3` was used to remove some of the water molecules from the OA complexes to reduce them to have the same number of waters.
- The systems' net charge was neutralized with Na+ and Cl- ions. More ions were added to reach the ionic strength of 10mM for OA/TEMOA systems and 25mM for CB8.
- The system was minimized with the L-BFGS optimization algorithm and equilibrated by running 1ns of Langevin dynamics (BAOAB splitting, 1fs time step) at 298.15K with a Monte Carlo barostat set at 1atm using `OpenMM 7.1.1`. PME was used for long-range electrostatic interactions with a cutoff of 10A. VdW interactions used the same 10A cutoff and a switching distance of 9A.
- After the equilibration, the `System` was serialized into the OpenMM `xml` format. The `rst7` file was generated during the equilibration using the `RestartReporter` in the `parmed.openmm` module. The AMBER `prmtop` and `rst7` files were then converted to GROMACS `top`/`gro` and PDB formats by ParmEd and MDTraj `1.9.1` respectively.
- `Reference/`: System files used to run the reference calculations for the main SAMPL challenge. See [`host_guest_instructions.md`](../host_guest_instructions.md#reference-calculations) for a detailed description of the files.
- `SAMPLing/`: Equilibrated system files for the SAMPLing challenge. See [`SAMPLing_instructions.md`](../SAMPLing_instructions.md#files-description) for a detailed description of the files.
Loading

0 comments on commit ec5d3d0

Please sign in to comment.