Skip to content

Commit

Permalink
Merge pull request #108 from Pathogen-Genomics-Cymru/bcg
Browse files Browse the repository at this point in the history
Bcg
  • Loading branch information
WhalleyT authored Sep 23, 2024
2 parents 8d77a6b + 2edb307 commit c459c8f
Show file tree
Hide file tree
Showing 20 changed files with 501 additions and 90 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-push-quay.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ on:
push:
branches:
- main
- ntmprofiler
- bcg
paths:
- '**/Dockerfile*'
- "bin/"
Expand Down
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,20 @@
![Build Status](https://github.com/Pathogen-Genomics-Cymru/lodestone/workflows/build-push-quay/badge.svg)
![Build Status](https://github.com/Pathogen-Genomics-Cymru/lodestone/workflows/pytest/badge.svg)
![Build Status](https://github.com/Pathogen-Genomics-Cymru/lodestone/workflows/stub-run/badge.svg)


## Table of Contents
- [What is Lodestone](#what-is-lodestone)
- [Quick Start](#quick-start)
- [Executors](#executors)
- [System Requirements](#system-requirements)
- [Parameters](#parameters)
- [Stub Runs](#stub-runs)
- [Checkpoints](#checkpoints)
- [Acknowledgments](#acknowledgements)
- [License](#-license)

## What is Lodestone?

This pipeline takes as input reads presumed to be from one of 10 mycobacterial genomes: abscessus, africanum, avium, bovis, chelonae, chimaera, fortuitum, intracellulare, kansasii, tuberculosis. Input should be in the form of one directory containing pairs of fastq(.gz) or bam files.

Pipeline cleans and QCs reads with fastp and FastQC, classifies with Kraken2 & Afanc, removes non-bacterial content, and - by alignment to any minority genomes - disambiguates mixtures of bacterial reads. Cleaned reads are aligned to either of the 10 supported genomes and variants called. Produces as output one directory per sample, containing cleaned fastqs, sorted, indexed BAM, VCF, F2 and F47 statistics, an antibiogram and summary reports.
Expand Down Expand Up @@ -40,7 +53,7 @@ By default, the pipeline will just run on the local machine. To run on a cluster
### System Requirements ###
Minimum recommended requirements: 32GB RAM, 8CPU

## Params ##
## Paramaters ##
The following parameters should be set in `nextflow.config` or specified on the command line:

* **input_dir**<br />
Expand Down Expand Up @@ -84,7 +97,7 @@ For more information on the parameters run `nextflow run main.nf --help`

The path to the singularity images can also be changed in the singularity profile in `nextflow.config`. Default value is `${baseDir}/singularity`

## Stub-run ##
## Stub runs ##
To test the stub run:
```
NXF_VER=20.11.0-edge nextflow run main.nf -stub -config testing.config
Expand Down Expand Up @@ -150,3 +163,5 @@ For a list of direct authors of this pipeline, please see the contributors list.

The preprocessing sub-workflow is based on the preprocessing nextflow DSL1 pipeline written by Stephen Bush, University of Oxford. The clockwork sub-workflow uses aspects of the variant calling workflow from https://github.com/iqbal-lab-org/clockwork, lead author Martin Hunt, Iqbal Lab at EMBL-EBI

## License
The tool is licensed under the V3 GNU Affero GPL license. Please see the [LICENSE](LICENSE) file for more details.
15 changes: 14 additions & 1 deletion bin/identify_tophit_and_contaminants2.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,8 +359,21 @@ def process_reports(afanc_json_path, kraken_json_path, supposed_species, unmix_m

# IS THE TOP SPECIES HIT ONE OF THE 10 ACCEPTABLE POSSIBILITIES? IF SO, PROVIDE A LINK TO THE REFERENCE GENOME
re_top_species = re.findall(r"^(Mycobact|Mycolicibac)\w+ (abscessus|africanum|avium|bovis|chelonae|chimaera|fortuitum|intracellulare|kansasii|tuberculosis).*?$", top_species)
re_top_variant = re.findall(r"^(Mycobact|Mycolicibac)\w+ (abscessus|africanum|avium|bovis|chelonae|chimaera|fortuitum|intracellulare|kansasii|tuberculosis) ()\w+ (bovis|orgis|caprae).*?$", top_species)
if len(re_top_variant) != 0:
re_top_species = re_top_variant
if len(re_top_species) > 0:
identified_species = re_top_species[0][1]
if len(re_top_species[0]) == 2:
identified_species = re_top_species[0][1]
#deal with lineages
lineage_dict = {"La1.": "bovis",
"La2.": "caprae",
"La3.": "orygis"}
for lineage in lineage_dict:
if lineage in top_species:
identified_species = lineage_dict[lineage]
else:
identified_species = re_top_species[0][3] #we have bovis (or orgis/caprae) with variant in the name
if supposed_species == 'null':
out['summary_questions']['is_the_top_species_appropriate'] = 'yes'
elif ((supposed_species != 'null') & (supposed_species == identified_species)):
Expand Down
6 changes: 3 additions & 3 deletions bin/run-vcfmix.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@

def go(vcf_file):
# create a lineagescan object
v = lineageScan()
v = lineageScan(minos=True)

# assuming postfix of ".bcftools.vcf"
sampleid = vcf_file[:-13]
# assuming postfix of ".minos.vcf"
sampleid = vcf_file.replace("_allelic_depth.minos.vcf", "")
print(sampleid)

res = v.parse(vcffile=vcf_file, sample_id=sampleid)
Expand Down
4 changes: 2 additions & 2 deletions config/containers.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ process {
}

withLabel:clockwork {
container = "quay.io/pathogen-genomics-cymru/clockwork:0.9.9"
container = "quay.io/pathogen-genomics-cymru/clockwork:0.9.9r1"
}

withLabel:vcfpredict {
container = "quay.io/pathogen-genomics-cymru/vcfpredict:0.9.9"
container = "quay.io/pathogen-genomics-cymru/vcfpredict:0.9.9r1"
}
}
25 changes: 19 additions & 6 deletions docker/Dockerfile.clockwork-0.9.9
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM debian:buster
FROM ubuntu:focal


LABEL maintainer="[email protected]" \
about.summary="container for the clockwork workflow"
Expand All @@ -16,17 +17,17 @@ vcftools_version=0.1.15 \
mccortex_version=97aba198d632ee98ac1aa496db33d1a7a8cb7e51 \
stampy_version=1.0.32r3761 \
python_version=3.6.5 \
clockwork_version=2364dec4cbf25c844575e19e8fe0a319d10721b5
clockwork_version=2364dec4cbf25c844575e19e8fe0a319d10721b5 \
gatk_version=4.6.0.0

ENV PACKAGES="procps curl git build-essential wget zlib1g-dev pkg-config jq r-base-core rsync autoconf libncurses-dev libbz2-dev liblzma-dev libcurl4-openssl-dev cmake tabix libvcflib-tools libssl-dev software-properties-common perl locales locales-all" \
PYTHON="python2.7 python-dev"

COPY bin/ /opt/bin/
ENV PATH=/opt/bin:$PATH


RUN apt-get update \
&& apt-get install -y $PACKAGES $PYTHON \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y $PACKAGES $PYTHON \
&& curl -fsSL https://www.python.org/ftp/python/${python_version}/Python-${python_version}.tgz | tar -xz \
&& cd Python-${python_version} \
&& ./configure --enable-optimizations \
Expand All @@ -36,7 +37,15 @@ RUN apt-get update \
&& ln -s /usr/local/bin/pip3.6 /usr/local/bin/pip3 \
&& pip3 install --upgrade pip \
&& pip3 install 'cluster_vcf_records==0.13.1' pysam setuptools awscli \
&& apt-get update && apt-get install -y openjdk-11-jdk
&& apt-get update

#update jdk
RUN wget https://download.java.net/java/GA/jdk18/43f95e8614114aeaa8e8a5fcf20a682d/36/GPL/openjdk-18_linux-x64_bin.tar.gz
RUN tar -xvf openjdk-18_linux-x64_bin.tar.gz
RUN mv jdk-18* /opt/
ENV JAVA_HOME=/opt/jdk-18
ENV PATH=$PATH:$JAVA_HOME/bin


RUN curl -fsSL https://github.com/samtools/samtools/archive/${samtools_version}.tar.gz | tar -xz \
&& curl -fsSL https://github.com/samtools/htslib/releases/download/${htslib_version}/htslib-${htslib_version}.tar.bz2 | tar -xj \
Expand Down Expand Up @@ -107,8 +116,12 @@ RUN git clone --recursive https://github.com/iqbal-lab/cortex.git \
&& pip3 install . \
&& chmod +x scripts/clockwork

RUN wget https://github.com/broadinstitute/gatk/releases/download/${gatk_version}/gatk-${gatk_version}.zip -O /tmp/gatk-${gatk_version}.zip\
&& unzip /tmp/gatk-${gatk_version}.zip -d /opt/ \
&& rm /tmp/gatk-${gatk_version}.zip -f

ENV CLOCKWORK_CORTEX_DIR=/cortex \
PATH=${PATH}:/clockwork/python/scripts \
PATH=${PATH}:/clockwork/python/scripts:/opt/gatk-${gatk_version} \
PICARD_JAR=/usr/local/bin/picard.jar

ENV LC_ALL en_US.UTF-8 \
Expand Down
131 changes: 131 additions & 0 deletions docker/Dockerfile.clockwork-0.9.9r1
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
FROM ubuntu:focal


LABEL maintainer="[email protected]" \
about.summary="container for the clockwork workflow"

ENV samtools_version=1.12 \
htslib_version=1.12 \
bcftools_version=1.12 \
minimap2_version=2.17 \
picard_version=2.18.16 \
gramtools_version=8af53f6c8c0d72ef95223e89ab82119b717044f2 \
vt_version=2187ff6347086e38f71bd9f8ca622cd7dcfbb40c \
minos_version=0.11.0 \
cortex_version=3a235272e4e0121be64527f01e73f9e066d378d3 \
vcftools_version=0.1.15 \
mccortex_version=97aba198d632ee98ac1aa496db33d1a7a8cb7e51 \
stampy_version=1.0.32r3761 \
python_version=3.6.5 \
clockwork_version=2364dec4cbf25c844575e19e8fe0a319d10721b5 \
gatk_version=4.6.0.0

ENV PACKAGES="procps curl git build-essential wget zlib1g-dev pkg-config jq r-base-core rsync autoconf libncurses-dev libbz2-dev liblzma-dev libcurl4-openssl-dev cmake tabix libvcflib-tools libssl-dev software-properties-common perl locales locales-all" \
PYTHON="python2.7 python-dev"

COPY bin/ /opt/bin/
ENV PATH=/opt/bin:$PATH

RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y $PACKAGES $PYTHON \
&& curl -fsSL https://www.python.org/ftp/python/${python_version}/Python-${python_version}.tgz | tar -xz \
&& cd Python-${python_version} \
&& ./configure --enable-optimizations \
&& make altinstall \
&& cd .. \
&& ln -s /usr/local/bin/python3.6 /usr/local/bin/python3 \
&& ln -s /usr/local/bin/pip3.6 /usr/local/bin/pip3 \
&& pip3 install --upgrade pip \
&& pip3 install 'cluster_vcf_records==0.13.1' pysam setuptools awscli \
&& apt-get update

#update jdk
RUN wget https://download.java.net/java/GA/jdk18/43f95e8614114aeaa8e8a5fcf20a682d/36/GPL/openjdk-18_linux-x64_bin.tar.gz
RUN tar -xvf openjdk-18_linux-x64_bin.tar.gz
RUN mv jdk-18* /opt/
ENV JAVA_HOME=/opt/jdk-18
ENV PATH=$PATH:$JAVA_HOME/bin


RUN curl -fsSL https://github.com/samtools/samtools/archive/${samtools_version}.tar.gz | tar -xz \
&& curl -fsSL https://github.com/samtools/htslib/releases/download/${htslib_version}/htslib-${htslib_version}.tar.bz2 | tar -xj \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C samtools-${samtools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r samtools-${samtools_version} \
&& curl -fsSL https://github.com/samtools/bcftools/archive/refs/tags/${bcftools_version}.tar.gz | tar -xz \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} \
&& make -C bcftools-${bcftools_version} -j HTSDIR=../htslib-${htslib_version} prefix=/usr/local install \
&& rm -r bcftools-${bcftools_version}


RUN curl -fsSL minimap2-${minimap2_version}.tar.gz https://github.com/lh3/minimap2/archive/v${minimap2_version}.tar.gz | tar -xz \
&& cd minimap2-${minimap2_version} \
&& make \
&& chmod +x minimap2 \
&& mv minimap2 /usr/local/bin \
&& cd .. \
&& rm -r minimap2-${minimap2_version} \
&& wget https://github.com/broadinstitute/picard/releases/download/${picard_version}/picard.jar -O /usr/local/bin/picard.jar


RUN git clone https://github.com/atks/vt.git vt-git \
&& cd vt-git \
&& git checkout ${vt_version} \
&& make \
&& cd .. \
&& mv vt-git/vt /usr/local/bin \
&& pip3 install tox "six>=1.14.0" \
&& git clone https://github.com/iqbal-lab-org/gramtools \
&& cd gramtools \
&& git checkout ${gramtools_version} \
&& pip3 install . \
&& cd .. \
&& pip3 install cython \
&& pip3 install git+https://github.com/iqbal-lab-org/minos@v${minos_version}


RUN git clone --recursive https://github.com/iqbal-lab/cortex.git \
&& cd cortex \
&& git checkout ${cortex_version} \
&& bash install.sh \
&& make NUM_COLS=1 cortex_var \
&& make NUM_COLS=2 cortex_var \
&& cd .. \
&& mkdir bioinf-tools \
&& cd bioinf-tools \
&& curl -fsSL http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-${stampy_version}.tgz | tar -xz \
&& make -C stampy-* \
&& cp -s stampy-*/stampy.py . \
&& curl -fsSL https://github.com/vcftools/vcftools/releases/download/v${vcftools_version}/vcftools-${vcftools_version}.tar.gz | tar -xz \
&& cd vcftools-${vcftools_version} \
&& ./configure --prefix $PWD/install \
&& make && make install \
&& ln -s src/perl/ . \
&& cd .. \
&& git clone --recursive https://github.com/mcveanlab/mccortex \
&& cd mccortex \
&& git checkout ${mccortex_version} \
&& make all \
&& cd .. \
&& cp -s mccortex/bin/mccortex31 . \
&& cd .. \
&& git clone https://github.com/iqbal-lab-org/clockwork \
&& cd clockwork \
&& git checkout ${clockwork_version} \
&& cd python \
&& pip3 install . \
&& chmod +x scripts/clockwork

RUN wget https://github.com/broadinstitute/gatk/releases/download/${gatk_version}/gatk-${gatk_version}.zip -O /tmp/gatk-${gatk_version}.zip\
&& unzip /tmp/gatk-${gatk_version}.zip -d /opt/ \
&& rm /tmp/gatk-${gatk_version}.zip -f

ENV CLOCKWORK_CORTEX_DIR=/cortex \
PATH=${PATH}:/clockwork/python/scripts:/opt/gatk-${gatk_version} \
PICARD_JAR=/usr/local/bin/picard.jar

ENV LC_ALL en_US.UTF-8 \
LANG en_US.UTF-8 \
LANGUAGE en_US.UTF-8


3 changes: 1 addition & 2 deletions docker/Dockerfile.tbprofiler-0.9.9
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,7 @@ RUN curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest| tar -xvj bin
# install tb-profiler via bioconda; install into 'base' conda env
RUN micromamba install --yes --name base --channel conda-forge --channel bioconda \
tb-profiler=${TBPROFILER_VER}

RUN micromamba install --yes --name base --channel conda-forge --channel bioconda gatk4
RUN micromamba install --yes --name base --channel conda-forge --channel bioconda gatk4
RUN micromamba install --yes --name base --channel conda-forge --channel bioconda samtools
RUN micromamba install --yes --name base --channel conda-forge jq
RUN micromamba clean --all --yes
Expand Down
3 changes: 3 additions & 0 deletions docker/Dockerfile.tbtamr-0.9.9
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ FROM ubuntu:jammy

WORKDIR /

ENV freebayes_version=1.3.6 \
tbtamr_version=0.0.4

# LABEL instructions tag the image with metadata that might be important to the user
LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="0.9.9"
Expand Down
26 changes: 26 additions & 0 deletions docker/Dockerfile.vcfpredict-0.9.9r1
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM ubuntu:20.04

LABEL maintainer="[email protected]" \
about.summary="container for the vcf predict workflow"

#add run-vcf to container
COPY bin/ /opt/bin/
ENV PATH=/opt/bin:$PATH

ENV PACKAGES="procps curl wget git build-essential libhdf5-dev libffi-dev r-base-core jq" \
PYTHON="python3 python3-pip python3-dev"

ENV vcfmix_version=d4693344bf612780723e39ce27c8ae3868f95417

#apt updates
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata \
&& apt-get install -y $PACKAGES $PYTHON \
&& apt-get install -y python3-packaging \
&& git clone https://github.com/whalleyt/VCFMIX.git \
&& cd VCFMIX \
&& pip3 install recursive_diff \
&& pip3 install awscli \
&& pip3 install . \
&& cp -r data /usr/local/lib/python3.8/dist-packages \
&& cd ..
14 changes: 5 additions & 9 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,10 @@ nextflow run main.nf -profile docker --filetype bam --input_dir bam_dir --unmix_
}


resistance_profilers = ["tb-profiler", "tbtamr", "none"]
resistance_profilers = ["tb-profiler", "tbtamr"]

if(!resistance_profilers.contains(params.resistance_profiler)){
exit 1, 'Invalid resistance profiler. Must be one of "tb-profiler", "tbtamr" \
or "none" to skip.'
exit 1, 'Invalid resistance profiler. Must be one of "tb-profiler" or "tbtamr"'
}


Expand Down Expand Up @@ -199,13 +198,10 @@ workflow {
clockwork(preprocessing_output)

// VCFPREDICT SUB-WORKFLOW
sample_and_fastqs = clockwork.out.sample_and_fastqs
mpileup_vcf = clockwork.out.mpileup_vcf
minos_vcf = clockwork.out.minos_vcf
reference = clockwork.out.reference
bam = clockwork.out.bam
profiler_input_vcf = clockwork.out.profiler_input_vcf
profiler_input_fq = clockwork.out.profiler_input_fq

vcfpredict(sample_and_fastqs, bam, mpileup_vcf, minos_vcf, reference)
vcfpredict(profiler_input_fq, profiler_input_vcf)

}

Expand Down
Loading

0 comments on commit c459c8f

Please sign in to comment.