nf-core
diff --git a/‎.github/workflows/ci.yml
Lines changed: 1 addition & 0 deletions b/‎.github/workflows/ci.yml
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md
Lines changed: 4 additions & 2 deletions b/‎CHANGELOG.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎README.md
Lines changed: 14 additions & 0 deletions b/‎README.md
Lines changed: 14 additions & 0 deletions
diff --git a/‎bin/generate_report.py
Lines changed: 1 addition & 0 deletions b/‎bin/generate_report.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎conf/dbs.config
Lines changed: 29 additions & 0 deletions b/‎conf/dbs.config
Lines changed: 29 additions & 0 deletions
diff --git a/‎conf/modules_helixfold3.config
Lines changed: 39 additions & 0 deletions b/‎conf/modules_helixfold3.config
Lines changed: 39 additions & 0 deletions
diff --git a/‎conf/test_helixfold3.config
Lines changed: 37 additions & 0 deletions b/‎conf/test_helixfold3.config
Lines changed: 37 additions & 0 deletions
diff --git a/‎dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3
Lines changed: 34 additions & 0 deletions b/‎dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3
Lines changed: 34 additions & 0 deletions
diff --git a/‎dockerfiles/environment_nfcore-proteinfold_helixfold3.yaml
Lines changed: 35 additions & 0 deletions b/‎dockerfiles/environment_nfcore-proteinfold_helixfold3.yaml
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/output.md
Lines changed: 13 additions & 0 deletions b/‎docs/output.md
Lines changed: 13 additions & 0 deletions
@@ -45,6 +45,7 @@ jobs:
           - "test_esmfold"
           - "test_split_fasta"
           - "test_rosettafold_all_atom"
+          - "test_helixfold3"
         isMaster:
           - ${{ github.base_ref == 'master' }}
         # Exclude conda and singularity on dev
 
@@ -13,9 +13,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [[#180](https://github.com/nf-core/proteinfold/issues/180)] - Implement Fooldseek.
 - [[#188](https://github.com/nf-core/proteinfold/issues/188)] - Fix colabfold image to run in gpus.
 - [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`.
-- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)]- Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
-- [[#214](https://github.com/nf-core/proteinfold/issues/214)]- Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
+- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)] - Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
+- [[#214](https://github.com/nf-core/proteinfold/issues/214)] - Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
 - [[PR ##220](https://github.com/nf-core/proteinfold/pull/220)] - Add RoseTTAFold-All-Atom module.
+- [[PR ##223](https://github.com/nf-core/proteinfold/pull/223)] - Add HelixFold3 module.
 - [[#235](https://github.com/nf-core/proteinfold/issues/235)] - Update samplesheet to new version (switch from `sequence` column to `id`).
 - [[#240](https://github.com/nf-core/proteinfold/issues/240)] - Separate download and input of pdb `mmcif` files and `obsolete` database.
 
@@ -119,6 +120,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
 |                       | `--esmfold_params_path`                  |
 |                       | `--skip_multiqc`                         |
 |                       | `--rosettafold_all_atom_db`              |
+|                       | `--helixfold3_db`                        |
 
 > **NB:** Parameter has been **updated** if both old and new parameter information is present.
 > **NB:** Parameter has been **added** if just the new parameter information is present.
 
@@ -41,6 +41,8 @@ On release, automated continuous integration tests run the pipeline on a full-si
 
    vi. [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/) - Regular RFAA
 
+   vii. [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3) - Regular HF3
+
 ## Usage
 
 > [!NOTE]
@@ -150,6 +152,18 @@ The pipeline takes care of downloading the databases and parameters required by
       -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
   ```
 
+- The helixfold3 mode can be run using the command below:
+
+  ```console
+  nextflow run nf-core/proteinfold \
+      --input samplesheet.csv \
+      --outdir <OUTDIR> \
+      --mode helixfold3 \
+      --helixfold3_db <null (default) | PATH> \
+      --use_gpu <true/false> \
+      -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
+  ```
+
 > [!WARNING]
 > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
 
 
@@ -308,6 +308,7 @@ def pdb_to_lddt(pdb_files, generate_tsv):
     "alphafold2": "AlphaFold2",
     "colabfold": "ColabFold",
     "rosettafold_all_atom": "Rosettafold_All_Atom",
+    "helixfold3": "HelixFold3"
 }
 
 parser = argparse.ArgumentParser()
 
@@ -61,6 +61,35 @@ params {
     bfd_rosettafold_all_atom_path       = "${params.rosettafold_all_atom_db}/bfd/*"
     rfaa_paper_weights_path             = "${params.rosettafold_all_atom_db}/RFAA_paper_weights.pt"
 
+    // Helixfold3 links
+    helixfold3_uniclust30_link          = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz'
+    helixfold3_ccd_preprocessed_link    = 'https://paddlehelix.bd.bcebos.com/HelixFold3/CCD/ccd_preprocessed_etkdg.pkl.gz'
+    helixfold3_rfam_link                = 'https://paddlehelix.bd.bcebos.com/HelixFold3/MSA/Rfam-14.9_rep_seq.fasta'
+    helixfold3_init_models_link         = 'https://paddlehelix.bd.bcebos.com/HelixFold3/params/HelixFold3-params-240814.zip'
+    helixfold3_bfd_link                 = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz'
+    helixfold3_small_bfd_link           = 'https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz'
+    helixfold3_uniprot_sprot_link       = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz'
+    helixfold3_uniprot_trembl_link      = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz'
+    helixfold3_pdb_seqres_link          = "${params.pdb_seqres_link}"
+    helixfold3_uniref90_link            = 'ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz'
+    helixfold3_mgnify_link              = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/mgy_clusters_2018_12.fa.gz'
+    helixfold3_pdb_mmcif_link           = 'rsync.rcsb.org::ftp_data/structures/divided/mmCIF/'
+    helixfold3_pdb_obsolete_link        = 'ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat'
+
+    // Helixfold3 paths
+    helixfold3_uniclust30_path          = "${params.helixfold3_db}/uniclust30/*"
+    helixfold3_ccd_preprocessed_path    = "${params.helixfold3_db}/ccd_preprocessed_etkdg.pkl.gz"
+    helixfold3_rfam_path                = "${params.helixfold3_db}/Rfam-14.9_rep_seq.fasta"
+    helixfold3_init_models_path         = "${params.helixfold3_db}/HelixFold3-240814.pdparams"
+    helixfold3_bfd_path                 = "${params.helixfold3_db}/bfd/*"
+    helixfold3_small_bfd_path           = "${params.helixfold3_db}/small_bfd/*"
+    helixfold3_uniprot_path             = "${params.helixfold3_db}/uniprot/*"
+    helixfold3_pdb_seqres_path          = "${params.helixfold3_db}/pdb_seqres/*"
+    helixfold3_uniref90_path            = "${params.helixfold3_db}/uniref90/*"
+    helixfold3_mgnify_path              = "${params.helixfold3_db}/mgnify/*"
+    helixfold3_pdb_mmcif_path           = "${params.helixfold3_db}/pdb_mmcif/*"
+    helixfold3_maxit_src_path           = "${params.helixfold3_db}/maxit-v11.200-prod-src"
+
     // Esmfold links
     esmfold_3B_v1                        = 'https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt'
     esm2_t36_3B_UR50D                    = 'https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt'
 
@@ -0,0 +1,39 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Config file for defining DSL2 per module options and publishing paths
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Available keys to override module options:
+        ext.args   = Additional arguments appended to command in module.
+        ext.args2  = Second set of arguments appended to command in module (multi-tool modules).
+        ext.args3  = Third set of arguments appended to command in module (multi-tool modules).
+        ext.prefix = File name prefix for output files.
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    withName: 'GUNZIP|COMBINE_UNIPROT|DOWNLOAD_PDBMMCIF|ARIA2_PDB_SEQRES' {
+        publishDir = [
+            path: {"${params.outdir}/DBs/helixfold3/"},
+            mode: 'symlink',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+        ]
+    }
+
+    withName: 'RUN_HELIXFOLD3' {
+        if(params.use_gpu) { accelerator = 1 }
+        publishDir = [
+                path: { "${params.outdir}/helixfold3/" },
+                mode: 'copy',
+                saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                pattern: '*.*'
+            ]
+    }
+
+    withName: 'NFCORE_PROTEINFOLD:HELIXFOLD3:MULTIQC' {
+        publishDir = [
+            path: { "${params.outdir}/multiqc" },
+            mode: 'copy',
+            saveAs: { filename -> filename.equals('versions.yml') ? null : "helixfold3_$filename" }
+        ]
+    }
+}
@@ -0,0 +1,37 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+    Use as follows:
+        nextflow run nf-core/proteinfold -profile test_helixfold3,<docker/singularity> --outdir <OUTDIR>
+----------------------------------------------------------------------------------------
+*/
+
+stubRun = true
+
+// Limit resources so that this can run on GitHub Actions
+process {
+    resourceLimits = [
+        cpus: 4,
+        memory: '15.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'Test profile'
+    config_profile_description = 'Minimal test dataset to check pipeline function'
+
+    // Input data to test helixfold3
+    mode          = 'helixfold3'
+    helixfold3_db = "${projectDir}/assets/dummy_db_dir"
+    input         = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
+}
+
+process {
+    withName: 'RUN_HELIXFOLD3' {
+        container = 'biocontainers/gawk:5.1.0'
+    }
+}
+
@@ -0,0 +1,34 @@
+FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
+
+LABEL Author="[email protected]" \
+    title="nfcore/proteinfold_helixfold3" \
+    Version="0.9.0" \
+    description="Docker image containing all software requirements to run the RUN_HELIXFOLD3 module using the nf-core/proteinfold pipeline"
+
+ENV PYTHONPATH="/app/helixfold3:$PYTHONPATH" \
+    PATH="/conda/bin:/app/helixfold3:$PATH" \
+    PYTHON_BIN="/conda/envs/helixfold/bin/python3.9" \
+    ENV_BIN="/conda/envs/helixfold/bin" \
+    OBABEL_BIN="/conda/envs/helixfold/bin"
+
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y wget git && \
+    wget -q -P /tmp "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \
+    bash /tmp/Miniforge3-$(uname)-$(uname -m).sh -b -p /conda && \
+    rm -rf /tmp/Miniforge3-$(uname)-$(uname -m).sh /var/lib/apt/lists/* && \
+    apt-get autoremove -y && apt-get clean -y
+
+RUN git clone --single-branch --branch dev --depth 1 --no-checkout https://github.com/PaddlePaddle/PaddleHelix.git /app/helixfold3 && \
+    cd /app/helixfold3 && \
+    git sparse-checkout init --cone && \
+    git sparse-checkout set apps/protein_folding/helixfold3 && \
+    git checkout dev && \
+    mv apps/protein_folding/helixfold3/* . && \
+    rm -rf apps
+
+COPY environment_nfcore-proteinfold_helixfold3.yaml /app/helixfold3/
+RUN /conda/bin/mamba env create --file=/app/helixfold3/environment_nfcore-proteinfold_helixfold3.yaml && \
+    /conda/bin/mamba install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold && \
+    /conda/bin/mamba install -y -c conda-forge openbabel -n helixfold && \
+    /conda/bin/mamba clean --all --force-pkgs-dirs -y && \
+    rm -rf /root/.cache && \
+    apt-get autoremove -y && apt-get remove --purge -y wget git && apt-get clean -y
@@ -0,0 +1,35 @@
+name: helixfold
+channels:
+  - conda-forge
+  - bioconda
+  - nvidia
+  - biocore
+
+dependencies:
+  - python=3.9
+  - cuda-toolkit=12.0
+  - cudnn=8.4.0
+  - nccl=2.14
+  - libgcc
+  - libgomp
+  - pip
+  - aria2
+  - hmmer==3.4
+  - kalign2==2.04
+  - hhsuite==3.3.0
+  - openbabel
+  - pip:
+      - paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+      - absl-py==0.13.0
+      - biopython==1.79
+      - chex==0.0.7
+      - dm-haiku==0.0.4
+      - dm-tree==0.1.6
+      - docker==5.0.0
+      - immutabledict==2.0.0
+      - jax==0.2.14
+      - ml-collections==0.1.0
+      - pandas==1.3.4
+      - scipy==1.9.0
+      - rdkit-pypi==2022.9.5
+      - posebusters
@@ -14,6 +14,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and predicts pr
 - [ColabFold](https://github.com/sokrypton/ColabFold) - MMseqs2 (API server or local search) followed by ColabFold
 - [ESMFold](https://github.com/facebookresearch/esm)
 - [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/)
+- [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3)
 
 See main [README.md](https://github.com/nf-core/proteinfold/blob/master/README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
 
@@ -190,6 +191,18 @@ Below you can find an indicative example of the TSV file with the pLDDT scores p
 
 </details>
 
+### HelixFold3
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `run/`
+  - `<SEQUENCE NAME>_helixfold3.pdb` that is the structure with the highest pLDDT score (ranked first)
+  - `<SEQUENCE NAME>_plddt_mqc.tsv` that presents the pLDDT scores per residue for the predicted model
+  - `<SEQUENCE NAME>/` that contains the computed MSAs, prediction metadata, ranked structures, raw model outputs etc.
+
+</details>
+
 ### MultiQC report
 
 <details markdown="1">
Original file line number	Diff line number	Diff line change
`@@ -308,6 +308,7 @@ def pdb_to_lddt(pdb_files, generate_tsv):`
`308`	`308`	`"alphafold2": "AlphaFold2",`
`309`	`309`	`"colabfold": "ColabFold",`
`310`	`310`	`"rosettafold_all_atom": "Rosettafold_All_Atom",`
	`311`	`+ "helixfold3": "HelixFold3"`
`311`	`312`	`}`
`312`	`313`
`313`	`314`	`parser = argparse.ArgumentParser()`