Skip to content

Commit 82a78da

Browse files
committed
Edited parameters and updated readme to reflect numerous recent changes
1 parent eb0ebce commit 82a78da

File tree

2 files changed

+24
-41
lines changed

2 files changed

+24
-41
lines changed

README.md

+22-39
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Quaisar_singularity
2-
Quality, Assembly, Identification, Sequence type, Annotation, Resistance mechanisms for Hospital acquired infections (QuAISAR-H) is a mash-up of many publicly available tools with a splash of custom scripts with the purpose of producing a multi-layered quality checked report that identifies the taxonomy of and the Anti-microbial Resistence (AMR) elements from a paired end sequenced bacterial isolate.
2+
Quality, Assembly, Identification, Sequence type, Annotation, Resistance mechanisms for Hospital acquired infections (QuAISAR-H) is a mash-up of many publicly available tools with a splash of custom scripts with the purpose of producing a multi-layered quality checked report that identifies the taxonomy of and the Anti-microbial Resistance (AMR) elements from a paired end sequenced bacterial isolate.
33
This version uses containers to ease the necessity of having many preinstalled tools.
44

55
## Installation
@@ -28,14 +28,13 @@ The script will install miniconda, if there is no version of conda already insta
2828

2929
To run the pipeline use the following command with these parameters:
3030
A. ./quaisar_singularity.sh
31-
1. -i
32-
2. full path to the folder of paired-end reads
33-
4. -o
34-
5. name to describe the set of reads (e.g. project_name, run_id)
35-
Example: ./quaisar_singularity.sh -i /path/to/reads/folder -o project_name
31+
1. -i full path to the folder of paired-end reads
32+
2. -p run/set/project name for the set of reads being analyzed
33+
3. (optional) -o output_path_where_to_put_the_run/set/project_folder (if different than what was set during installation with the -w flag)
34+
Example: ./quaisar_singularity.sh -i /path/to/reads/folder -p run/set/project_name
3635

3736
## Output
38-
### Each run of the pipeline will produce the following files in the main (set/project name) folder
37+
### Each run of the pipeline will produce the following files in the main (run/set/project name) folder
3938
1. A folder for each isolate's output files
4039
2. .log - standard out and err of all tools are directed to this file as well as being shown on the terminal
4140
3. _command.log - shows all singularity commands that were called during the run (and what the parameters were)
@@ -66,38 +65,30 @@ The script will install miniconda, if there is no version of conda already insta
6665
19. .tax - determined taxonomy of isolate
6766
20. _time_summary.txt - estimate of length to complete each task
6867

69-
Isolates within the Enterobacteriaceae family (currently the only taxa that plasFlow is being run on) -
70-
1. Assembly_Stats_plasFlow
71-
2. c-sstar_plasFlow
72-
3. GAMA_plasFlow
73-
4. plasFlow
74-
5. plasmidFinder_on_plasFlow
75-
7668

7769

7870
## Table with all external tools and versions used, along with example commands for each
7971

8072
| Tool | Function | Version | command | command 2 | Notes |
8173
| --- | --- | --- | --- | --- | --- |
82-
| BBDuk | Remove PhiX reads | BBMap(37.87) | bbduk.sh - Xmx20g threads=12 in=raw_R1.fastq in2=raw_R2.fastq out=noPhiX_R1.fsq out2=noPhiX_R2.fsq ref=phiX_adapter.fasta k=31 hdist=1 | | |
83-
| Trimmomatic | Remove illumina adapters and filter by quality | 0.36 | trimmomatic PE -phred33 -threads 12 noPhiX_R1.fsq noPhiX_R2.fsq trimmed_R1_001.paired.fq trimmed_R1_001.unpaired.fq trimmed_R2_001.paired.fq trimmed_R2_001.unpaired.fq ILLUMINACLIP:adapters.fasat:2:30:10:8:TRUE SLIDINGWINDOW:20:30 LEADING:20 TRAILING:20 MINLEN:50 | | |
84-
| Kraken | Taxonomic Identification/Contamination Detection | 1.0 | Reads: kraken --paired --db kraken_mini_db_location --preload --fastq-input --threads 12 --output sample_name.kraken --classified-out sample_name.classified trimmed_R1_001.paired.fq trimmed_R2_001.paired.fq | Assembly: kraken --db kraken_mini_db_location --preload --threads 14 --output sample_name.kraken --classified-out sample_name.classified trimmed_assembly.fasta | |
85-
| Gottcha | Taxonomic Identification (Species database) | 1.0b | gottcha.pl --mode all --outdir output_directory --input paired.fq --database location_of_gottcha_database | | · Paired.fq is the concatenated file of trimmed R1 and R2 read files |
86-
| SPAdes | Assembly | 3.13.0 | spades.py --careful --memory 32 --only-assembler --pe1-1 trimmed_R1_001.paired.fq --pe1-2 trimmed_R2_001.paired.fq --pe1-s trimmed.single.fq" -o output_directory --phred-offset 33 -t 12 | | |
87-
| QUAST | Assembly Quality | 5.0.0 | Quast.py -o output_directory trimmed_assembly.fasta | | |
88-
| Prokka | Annotation | 1.14.5 | prokka --outdir output_directory trimmed_assembly.fasta | | |
74+
| BBDuk | Remove PhiX reads | BBMap(38.94) | bbduk.sh - Xmx20g threads=4 in=raw_R1.fastq in2=raw_R2.fastq out=noPhiX_R1.fsq out2=noPhiX_R2.fsq ref=phiX_adapter.fasta k=31 hdist=1 | | |
75+
| FastP | Remove illumina adapters and filter by quality | 0.23.1 | fastp -w 4 -i trimmed-noPhiX-R1.fsq -I trimmed-noPhiX-R2.fsq -o trimmed_R1_001.paired.fq --unpaired1 trimmed.single1.fq -O trimmed_R2_001.paired.fq --unpaired2 trimmed.single2.fq --adapter_fasta adapters.fasta -r --cut_right_window_size 20 --cut_right_mean_quality 30 -l 50 -g -5 20 -3 20 SLIDINGWINDOW:20:30 LEADING:20 TRAILING:20 MINLEN:50 | | |
76+
| Kraken | Taxonomic Identification/Contamination Detection | 1.1.1 | Reads: kraken --paired --db kraken_mini_db_location --preload --fastq-input --threads 4 --output sample_name.kraken --classified-out sample_name.classified trimmed_R1_001.paired.fq trimmed_R2_001.paired.fq | Assembly: kraken --db kraken_mini_db_location --preload --threads 4 --output sample_name.kraken --classified-out sample_name.classified trimmed_assembly.fasta | |
77+
| SPAdes | Assembly | 3.15.3 | spades.py --careful --memory 32 --only-assembler --pe1-1 trimmed_R1_001.paired.fq --pe1-2 trimmed_R2_001.paired.fq --pe1-s trimmed.single.fq" -o output_directory --phred-offset 33 -t 12 | | |
78+
| QUAST | Assembly Quality | 5.0.2 | Quast.py -o output_directory trimmed_assembly.fasta | | |
79+
| Prokka | Annotation | 1.14.6 | prokka --outdir output_directory trimmed_assembly.fasta | | |
8980
| BUSCO | Determine quality of assembly and identification | 3.0.2 | run_BUSCO.py -i prokka_output_directory/sample_name.faa -o sample_name -l location_of_database -m prot | | · Proper database is determined by matching lowest matching taxonomy to available databases |
90-
| pyANI | Taxonomic Identification | 0.2.7 | average_nucleotide_identity.py -i directory_of_fastas -o output_directory --write_excel | | · Directory of fastas contain the 20 closest genera matches based on mashtree distances |
91-
| c-SSTAR | Anti-microbial Resistance Mechanism identification on Assembly | 1.1.01 | Normal: python3 c-SSTAR_gapped.py -g trimmed_assembly.fasta -s 98-d AR_database_location > sample_name.gapped_98.sstar | Plasmid: python3 c-SSTAR_gapped.py -g plasmid_assembly.fasta -s 40-d AR_database_location > sample_name.gapped_40.sstar | |
92-
| SRST2 | Anti-microbial Resistance Mechanism Identification on reads, Sequence Typing | 0.2.0 | AR: SRST2--input_pe trimmed_R1.fastq.gz trimmed_R2_001.fastq.gz --output output_directory –threads 12 --gene_db AR_datbase_location | MLST: SRST2--input_pe trimmed_R1.fastq.gz trimmed_R2_001.fastq.gz --output output_directory –threads 12 --mlst_db location_of_mlst_database --mlst_definitions location_of_MLST_definitions --mlst_delimiter MLST_definitions_file_delimiter | · Newest MLST database and definitions are downloaded as part of the script. The mlst delimiter is determined using an included script within the SRST2, getmlst, that must be run prior to SRST2 |
93-
| MLST | Sequence Typing | 2.16 | mlst trimmed_assembly.fasta > sample_name.mlst | mlst --scheme database_name trimmed_assembly.fasta > sample_name_database_name.mlst | |
94-
| Barrnap | Taxonomic Identification | 0.8 | barrnap --kingdom bac --threads 12 trimmed_assembly.fasta > rRNA_seqs.fasta | | |
95-
| plasmidFinder | Anti-microbial Resistance Mechanism Identification on plasmid replicons | 2.1 | plasmidfinder -i trimmed_assembly.fasta -o output_directory -k 95.00 -p enterobacteriaceae|gram_positive | | |
96-
| plasFlow | plasmid contig identifier | 1.1.0 | PlasFlow.py --input scaffolds_trimmed_2000.fasta --output plasFlow_results.tsv --threshold 0.7 | | |
81+
| pyANI | Taxonomic Identification | 0.2.11 | average_nucleotide_identity.py -i directory_of_fastas -o output_directory --write_excel | | · Directory of fastas contain the 20 closest genera matches based on mashtree distances |
82+
| c-SSTAR | Anti-microbial Resistance Mechanism identification on Assembly | 1.1.01 | python3 c-SSTAR_gapped.py -g trimmed_assembly.fasta -s 98 -d AR_database_location > sample_name.gapped_98.sstar | | |
83+
| GAMMA | Anti-microbial Resistance Mechanism identification on Assembly | 1.4 | python3 GAMMA.py trimmed_assembly.fasta AR_database_location output_gama | | |
84+
| SRST2 | Anti-microbial Resistance Mechanism Identification on reads, Sequence Typing | 0.2.0 | AR: SRST2--input_pe trimmed_R1.fastq.gz trimmed_R2_001.fastq.gz --output output_directory –threads 4 --gene_db AR_datbase_location | MLST: SRST2--input_pe trimmed_R1.fastq.gz trimmed_R2_001.fastq.gz --output output_directory –threads 4 --mlst_db location_of_mlst_database --mlst_definitions location_of_MLST_definitions --mlst_delimiter MLST_definitions_file_delimiter | · Newest MLST database and definitions are downloaded as part of the script. The mlst delimiter is determined using an included script within the SRST2, getmlst, that must be run prior to SRST2 |
85+
| MLST | Sequence Typing | 2.19.0 | mlst trimmed_assembly.fasta > sample_name.mlst | mlst --scheme database_name trimmed_assembly.fasta > sample_name_database_name.mlst | |
86+
| Barrnap | Taxonomic Identification | 0.9 | barrnap --kingdom bac --threads 4 trimmed_assembly.fasta > rRNA_seqs.fasta | | |
87+
| plasmidFinder | Anti-microbial Resistance Mechanism Identification on plasmid replicons | 2.1.1 | plasmidfinder -i trimmed_assembly.fasta -o output_directory -k 95.00 -p enterobacteriaceae|gram_positive | | |
9788
| bowtie2 | read aligner | 2.2.9 | bowtie2-build -f plasFlow_results.tsv_chromosomes.fasta bowtie2_sample_name_chr | bowtie2 -x sample_name_chr -1 R1_001.paired.fq -2 R2_001.paired.fq -S sample_name.sam -p 12 --local | | |
98-
| samtools | sam converter | 1.10 | samtools view -bS sample_name.sam > sample_name.bam | sort -n sample_name.bam -o sample_name.bam.sorted
99-
| bedtools | bam converter | 2.29.2 | bamToFastq -i sample_name.bam.sorted -fq sample_name_R1_bacterial.fastq -fq2 sample_name__R2_bacterial.fastq
100-
| Unicycler | Assembly | 0.4.4 | unicycler -1 sample_name_R1_bacterial.fastq -2 sample__name_R2_bacterial.fastq -o sample_name_uni_assembly
89+
| samtools | sam converter | 1.14 | samtools view -bS sample_name.sam > sample_name.bam | sort -n sample_name.bam -o sample_name.bam.sorted
90+
| bedtools | bam converter | 2.30.0 | bamToFastq -i sample_name.bam.sorted -fq sample_name_R1_bacterial.fastq -fq2 sample_name__R2_bacterial.fastq
91+
10192

10293

10394
##Flag table of output summaries
@@ -116,8 +107,6 @@ kraken preassembly|||-.kraken(.gz) missing
116107
krona-kraken-preasmb|||-.krona or .html missing
117108
Pre classfify||-unclassified reads >30%|-no classified reads or kraken_summary_paired.txt missing
118109
pre Class Contam.|-More than one species found above 25% threshold|-No species found above 25%
119-
GOTTCHA_S||-.tsv OR .html missing|-Both .tsv and .html missing
120-
Gottcha Classifier||-unclassified reads >30%|-no classified reads or gottcha_species_summary.txt missing
121110
Assembly|||-scaffolds.fasta is missing
122111
Contig Trim||->200 contigs remain|-scaffolds_trimmed.fasta missing
123112
kraken postassembly|||-.kraken(.gz) missing
@@ -144,9 +133,3 @@ MLST-srst2|-No scheme found for taxa, more than 2 srst2 files found, more than 1
144133
16s_best_hit||-species not found|-Genus not found, 16s_blast_id.txt missing,No reads found,Unclassifiable reads found
145134
16s_largest_hit||-species not found|-Genus not found, 16s_blast_id.txt missing,No reads found,Unclassifiable reads found
146135
plasmidFinder|||-results_table_summary.txt missing,plasmidFinder folder missing
147-
plasFlow Assembly||-No plasmid scaffold found when expected|-plasFlow folder missing
148-
QUAST_plasFlow|||report.tsv missing
149-
plasFlow contig Trim|||-plasmid_scaffolds_trimmed.fasta missing
150-
c-SSTAR_plasFlow|-NO known AMR genes present,database is not current||-summary.txt or c-sstar folder missing
151-
GAMA_plasFlow|-NO known AMR genes present,database is not current||.GAMA or GAMA folder missing
152-
plasmidFndr-plasFlow|||-results_table_summary.txt missing,plasmidFinder_on_plasFlow folder missing

scripts/quaisar_singularity.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
# Reads can be gzipped or raw, but if your files are not named in any one of these formats, they will need to be renamed before running them through the pipeline
1515
# If you are submitting assemblies, use 1 as the value
1616
#
17-
# Output location: A folder with the name given for the -p flag will be created under the folder given with the -o flag (/-o/-p)
17+
# Output location: A folder with the name given for the -p flag will be created under the folder given with the -o flag (/output/project_name)
1818
#
1919
# v1.1 (11/17/2021)
2020
#
@@ -48,7 +48,7 @@ function write_Progress() {
4848
# Checking for proper number of arguments from command line
4949
if [[ $# -lt 1 || $# -gt 13 ]]; then
5050
echo -e "\\n\\n\\n"
51-
echo -e "Usage: ./quaisar_singularity.sh -i location_of_reads -o name_of_output_folder -p project_name [-s full_path_to_script_folder] [-r] [-a] [-d full_path_to_database_folder] [-c config.sh full_path_to_config_file]"
51+
echo -e "Usage: ./quaisar_singularity.sh -i location_of_reads -p project_name [-o name_of_output_folder] [-s full_path_to_script_folder] [-r] [-a] [-d full_path_to_database_folder] [-c config.sh full_path_to_config_file]"
5252
echo -e "Reads filenames need to have a postfix in one of the following _S*_L001_R*_00*.fastq[.gz], _S*_R*_0*X.fastq[.gz], _RX_00*.fastq[.gz], _[R]*.fastq[.gz]."
5353
echo -e "Assembly filenames need to have a postfix of .fasta or .fna"
5454
echo -e "If your reads are not named in any one of these formats, they will need to be renamed before running them through the pipeline"

0 commit comments

Comments
 (0)