Skip to content

Commit dd7a880

Browse files
authored
Merge pull request #220 from Australian-Structural-Biology-Computing/add-rosettafold-all-atom
Add RoseTTAFold-All-Atom
2 parents 1af71b4 + ee87982 commit dd7a880

20 files changed

+626
-47
lines changed

.github/CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ If you're not used to this workflow with git, you can start with some [docs from
2929
You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:
3030

3131
```bash
32-
nextflow run . --profile debug,test,docker --outdir <OUTDIR>
32+
nextflow run . -profile debug,test,docker --outdir <OUTDIR>
3333
```
3434

3535
When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
@@ -78,8 +78,8 @@ If you wish to contribute a new step, please use the following coding standards:
7878
5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool).
7979
6. Add sanity checks and validation for all relevant parameters.
8080
7. Perform local tests to validate that the new code works as expected.
81-
8. If applicable, add a new test command in `.github/workflow/ci.yml`.
82-
9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module.
81+
8. If applicable, add a new test command in `.github/workflows/ci.yml`.
82+
9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://multiqc.info/) module.
8383
10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`.
8484

8585
### Default values

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ jobs:
4444
- "test_colabfold_download"
4545
- "test_esmfold"
4646
- "test_split_fasta"
47+
- "test_rosettafold_all_atom"
4748
isMaster:
4849
- ${{ github.base_ref == 'master' }}
4950
# Exclude conda and singularity on dev

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- [[#180](https://github.com/nf-core/proteinfold/issues/180)] - Implement Fooldseek.
1414
- [[#188](https://github.com/nf-core/proteinfold/issues/188)] - Fix colabfold image to run in gpus.
1515
- [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`.
16-
- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)] - Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
17-
- [[#214](https://github.com/nf-core/proteinfold/issues/214)] - Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
16+
- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)]- Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
17+
- [[#214](https://github.com/nf-core/proteinfold/issues/214)]- Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
18+
- [[PR ##220](https://github.com/nf-core/proteinfold/pull/220)] - Add RoseTTAFold-All-Atom module.
1819
- [[#235](https://github.com/nf-core/proteinfold/issues/235)] - Update samplesheet to new version (switch from `sequence` column to `id`).
1920
- [[#240](https://github.com/nf-core/proteinfold/issues/240)] - Separate download and input of pdb `mmcif` files and `obsolete` database.
2021

@@ -117,6 +118,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
117118
| | `--esm2_t36_3B_UR50D_contact_regression` |
118119
| | `--esmfold_params_path` |
119120
| | `--skip_multiqc` |
121+
| | `--rosettafold_all_atom_db` |
120122

121123
> **NB:** Parameter has been **updated** if both old and new parameter information is present.
122124
> **NB:** Parameter has been **added** if just the new parameter information is present.

README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ On release, automated continuous integration tests run the pipeline on a full-si
3939

4040
v. [ESMFold](https://github.com/facebookresearch/esm) - Regular ESM
4141

42+
vi. [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/) - Regular RFAA
43+
4244
## Usage
4345

4446
> [!NOTE]
@@ -53,7 +55,7 @@ nextflow run nf-core/proteinfold \
5355
--outdir <OUTDIR>
5456
```
5557

56-
The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`] or [`--esmfold_db`]. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.
58+
The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold, ESMFold or RoseTTAFold-All-Atom. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you must provide for each database.
5759

5860
- The typical command to run AlphaFold2 mode is shown below:
5961

@@ -136,6 +138,18 @@ The pipeline takes care of downloading the databases and parameters required by
136138
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
137139
```
138140

141+
- The rosettafold_all_atom mode can be run using the command below:
142+
143+
```console
144+
nextflow run nf-core/proteinfold \
145+
--input samplesheet.csv \
146+
--outdir <OUTDIR> \
147+
--mode rosettafold_all_atom \
148+
--rosettafold_all_atom_db <null (default) | PATH> \
149+
--use_gpu <true/false> \
150+
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
151+
```
152+
139153
> [!WARNING]
140154
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
141155

assets/schema_input.json

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@
77
"items": {
88
"type": "object",
99
"properties": {
10+
"sequence": {
11+
"type": "string",
12+
"pattern": "^\\S+$",
13+
"errorMessage": "Sequence name must be provided and cannot contain spaces",
14+
"meta": ["sequence"]
15+
},
1016
"id": {
1117
"type": "string",
1218
"pattern": "^\\S+$",
@@ -17,10 +23,11 @@
1723
"type": "string",
1824
"format": "file-path",
1925
"exists": true,
20-
"pattern": "^\\S+\\.fa(sta)?$",
21-
"errorMessage": "Fasta file must be provided, cannot contain spaces and must have extension '.fa' or '.fasta'"
26+
"pattern": "^\\S+\\.(fa(sta)?|yaml|yml|json)$",
27+
"errorMessage": "Fasta, yaml or json file must be provided, cannot contain spaces and must have extension '.fa', '.fasta', '.yaml', '.yml', or '.json'"
2228
}
2329
},
24-
"required": ["id", "fasta"]
30+
"required": ["fasta"],
31+
"anyOf": [{ "required": ["sequence"] }, { "required": ["id"] }]
2532
}
2633
}

bin/generate_report.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@ def pdb_to_lddt(pdb_files, generate_tsv):
307307
"esmfold": "ESMFold",
308308
"alphafold2": "AlphaFold2",
309309
"colabfold": "ColabFold",
310+
"rosettafold_all_atom": "Rosettafold_All_Atom",
310311
}
311312

312313
parser = argparse.ArgumentParser()

conf/dbs.config

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,18 @@ params {
4949
"alphafold2_ptm" : "alphafold_params_2021-07-14"
5050
]
5151

52+
// RoseTTAFold_All_Atom links
53+
uniref30_rosettafold_all_atom_link = 'http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz'
54+
pdb100_rosettafold_all_atom_link = 'https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz'
55+
bfd_rosettafold_all_atom_link = 'https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz'
56+
rfaa_paper_weights_link = 'http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFAA_paper_weights.pt'
57+
58+
// RoseTTAFold_All_Atom paths
59+
uniref30_rosettafold_all_atom_path = "${params.rosettafold_all_atom_db}/uniref30/UniRef30_2020_06/*"
60+
pdb100_rosettafold_all_atom_path = "${params.rosettafold_all_atom_db}/pdb100_2021Mar03/*"
61+
bfd_rosettafold_all_atom_path = "${params.rosettafold_all_atom_db}/bfd/*"
62+
rfaa_paper_weights_path = "${params.rosettafold_all_atom_db}/RFAA_paper_weights.pt"
63+
5264
// Esmfold links
5365
esmfold_3B_v1 = 'https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt'
5466
esm2_t36_3B_UR50D = 'https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt'
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*
2+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3+
Config file for defining DSL2 per module options and publishing paths
4+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5+
Available keys to override module options:
6+
ext.args = Additional arguments appended to command in module.
7+
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
8+
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
9+
ext.prefix = File name prefix for output files.
10+
----------------------------------------------------------------------------------------
11+
*/
12+
13+
process {
14+
withName: 'GUNZIP|ARIA2_PDB_SEQRES' {
15+
publishDir = [
16+
path: {"${params.outdir}/DBs/rosettafold_all_atom/"},
17+
mode: 'symlink',
18+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
19+
]
20+
}
21+
22+
withName: 'RUN_ROSETTAFOLD_ALL_ATOM' {
23+
if(params.use_gpu) { accelerator = 1 }
24+
publishDir = [
25+
path: { "${params.outdir}/rosettafold_all_atom/" },
26+
mode: 'copy',
27+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
28+
pattern: '*.*'
29+
]
30+
}
31+
32+
withName: 'NFCORE_PROTEINFOLD:ROSETTAFOLD_ALL_ATOM:MULTIQC' {
33+
publishDir = [
34+
path: { "${params.outdir}/multiqc" },
35+
mode: 'copy',
36+
saveAs: { filename -> filename.equals('versions.yml') ? null : "rosettafold_all_atom_$filename" }
37+
]
38+
}
39+
}

conf/test_rosettafold_all_atom.config

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
/*
2+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3+
Nextflow config file for running minimal tests
4+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5+
Defines input files and everything required to run a fast and simple pipeline test.
6+
Use as follows:
7+
nextflow run nf-core/proteinfold -profile test_rosettafold_all_atom,<docker/singularity> --outdir <OUTDIR>
8+
----------------------------------------------------------------------------------------
9+
*/
10+
11+
stubRun = true
12+
13+
// Limit resources so that this can run on GitHub Actions
14+
process {
15+
resourceLimits = [
16+
cpus: 4,
17+
memory: '15.GB',
18+
time: '1.h'
19+
]
20+
}
21+
22+
params {
23+
config_profile_name = 'Test profile'
24+
config_profile_description = 'Minimal test dataset to check pipeline function'
25+
26+
// Input data to test rosettafold_all_atom
27+
mode = 'rosettafold_all_atom'
28+
rosettafold_all_atom_db = "${projectDir}/assets/dummy_db_dir"
29+
input = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
30+
}
31+
32+
process {
33+
withName: 'RUN_ROSETTAFOLD_ALL_ATOM' {
34+
container = 'biocontainers/gawk:5.1.0'
35+
}
36+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu24.04
2+
3+
LABEL Author="[email protected]" \
4+
title="nfcore/proteinfold_rosettafold_all_atom" \
5+
Version="1.2.0dev" \
6+
description="Docker image containing all software requirements to run the RUN_ROSETTAFOLD_ALL_ATOM module using the nf-core/proteinfold pipeline"
7+
8+
ENV PYTHONPATH="/app/RoseTTAFold-All-Atom" \
9+
PATH="/conda/bin:/app/RoseTTAFold-All-Atom:$PATH" \
10+
DGLBACKEND="pytorch" \
11+
LD_LIBRARY_PATH="/conda/lib:/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"
12+
13+
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y wget git && \
14+
wget -q -P /tmp "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \
15+
bash /tmp/Miniforge3-$(uname)-$(uname -m).sh -b -p /conda && \
16+
rm -rf /tmp/Miniforge3-$(uname)-$(uname -m).sh /var/lib/apt/lists/* && \
17+
apt-get autoremove -y && apt-get clean -y
18+
19+
RUN git clone --single-branch --depth 1 https://github.com/Australian-Structural-Biology-Computing/RoseTTAFold-All-Atom.git /app/RoseTTAFold-All-Atom && \
20+
cd /app/RoseTTAFold-All-Atom && \
21+
/conda/bin/mamba env create --file=environment.yaml && \
22+
/conda/bin/mamba run -n RFAA bash -c \
23+
"python /app/RoseTTAFold-All-Atom/rf2aa/SE3Transformer/setup.py install && \
24+
bash /app/RoseTTAFold-All-Atom/install_dependencies.sh" && \
25+
/conda/bin/mamba clean --all --force-pkgs-dirs -y
26+
27+
RUN cd /app/RoseTTAFold-All-Atom && \
28+
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz && \
29+
mkdir -p blast-2.2.26 && \
30+
tar -xf blast-2.2.26-x64-linux.tar.gz -C blast-2.2.26 && \
31+
cp -r blast-2.2.26/blast-2.2.26/ blast-2.2.26_bk && \
32+
rm -r blast-2.2.26 && \
33+
mv blast-2.2.26_bk/ blast-2.2.26 && \
34+
rm -rf /root/.cache *.tar.gz && \
35+
apt-get autoremove -y && apt-get remove --purge -y wget git && apt-get clean -y

0 commit comments

Comments
 (0)