You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-42Lines changed: 40 additions & 42 deletions
Original file line number
Diff line number
Diff line change
@@ -53,48 +53,46 @@ By default, the pipeline will just run on the local machine. To run on a cluster
53
53
Minimum recommended requirements: 32GB RAM, 8CPU
54
54
55
55
## Paramaters ##
56
-
The following parameters should be set in `nextflow.config` or specified on the command line:
57
-
58
-
***input_dir**<br />
59
-
Directory containing fastq OR bam files
60
-
***filetype**<br />
61
-
File type in input_dir. Either "fastq" or "bam"
62
-
***pattern**<br />
63
-
Regex to match fastq files in input_dir, e.g. "*_R{1,2}.fq.gz". Only mandatory if --filetype is "fastq"
64
-
***output_dir**<br />
65
-
Output directory for results
66
-
***unmix_myco**<br />
67
-
Do you want to disambiguate mixed-mycobacterial samples by read alignment? Either "yes" or "no":
68
-
* If "yes" workflow will remove reads mapping to any minority mycobacterial genomes but in doing so WILL ALMOST CERTAINLY ALSO reduce coverage of the principal species
69
-
* If "no" then mixed-mycobacterial samples will be left alone. Mixtures of mycobacteria + non-mycobacteria will still be disambiguated
70
-
***species**<br />
71
-
Principal species in each sample, assuming genus Mycobacterium. Default 'null'. If parameter used, takes 1 of 10 values: abscessus, africanum, avium, bovis, chelonae, chimaera, fortuitum, intracellulare, kansasii, tuberculosis. Using this parameter will apply an additional sanity test to your sample
72
-
* If you DO NOT use this parameter (default option), pipeline will determine principal species from the reads and consider any other species a contaminant
73
-
* If you DO use this parameter, pipeline will expect this to be the principal species. It will fail the sample if reads from this species are not actually the majority
Directory containing Bowtie2 index (obtain from ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/hg19_1kgmaj_bt2.zip). The specified path should NOT include the index name
78
-
***bowtie_index_name**<br />
79
-
Name of the bowtie index, e.g. hg19_1kgmaj<br />
80
-
***vcfmix**<br />
81
-
Run [vcfmix](https://github.com/AlexOrlek/VCFMIX), yes or no. Set to no for synthetic samples<br />
82
-
***resistance_profiler**<br />
83
-
Run resistance profiling for Mycobacterium tubercuclosis. Either ["tb-profiler"](https://tbdr.lshtm.ac.uk/), ["tbtamr"](https://github.com/MDU-PHL/tbtamr) or "none".
84
-
***afanc_myco_db**<br />
85
-
Path to the [afanc](https://github.com/ArthurVM/Afanc) database used for speciation. Obtain from https://s3.climb.ac.uk/microbial-bioin-sp3/Mycobacteriaciae_DB_7.0.tar.gz
86
-
***update_tbprofiler**<br />
87
-
Update tb-profiler. Either "yes" or "no". "yes" may be useful when running outside of a container for the first time as we will not have constructed a tb-profiler database matching our reference. This is not needed with the climb, docker and singluarity profiles as the reference has already been added. Alternatively you can run ```tb-profiler update_tbdb --match_ref <lodestone_dir>/resources/tuberculosis.fasta```.
88
-
***refseq**<br />
89
-
Path to assembly summary refseq file (taken from [here](https://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt)). A local version is stored for reproducibility purposes in ```resources/``` but for best results download the latest version. Instead of downloading, the link can be supplied directly in the refseq argument e.g. `--refseq "https://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txtftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt"`
90
-
***permissive**<br />
91
-
One of "yes" or "no". If "yes", continue to clockwork flags will be ignored and alignment will be performed anyway. If there are not enough reads and/or not a reference found the programme will still exit.
92
-
***collate**<br />
93
-
One of "yes" or "no". If "yes" collate function will be ran to collect all resistance profiling reports. Will be outputted to the base level output directory (e.g. ```output/tbprofiler.variants.csv```)
94
-
95
-
For more information on the parameters run `nextflow run main.nf --help`
96
-
97
-
The path to the singularity images can also be changed in the singularity profile in `nextflow.config`. Default value is `${baseDir}/singularity`
56
+
The following parameters should be set in `nextflow.config`. They can be accessed by `nextflow run main.nf --help`:
57
+
58
+
```
59
+
--input_dir [string] Input directory containing FASTQs or BAMs
60
+
--pattern [string] Glob pattern for FASTQs or BAM
61
+
--output_dir [string] Output directory
62
+
--permissive [boolean] Flag. If True, errors in decontamination will be demoted to warnings
63
+
--filetype [string] Either "fastq" or "bam". Assumes FASTQs are PE Illumina reads and BAMs are mapped against one of the references in resources/ (accepted: bam, fastq) [default: fastq]
64
+
--unmix_myco [boolean] Flag. If True then minority Mycobacteriaceae reads will be removed. If False, they will be discarded
65
+
--species [string] Species which will be mapped against, corresponding to references in resources/: can be one of abscessus, africanum, avium, bovis, chelonae, chimaera, fortuitum, intracellulare, kansasii, tuberculosis or null. If 'null' the top hit as determined by Afanc will be used (accepted: null, abscessus, africanum,
--sing_dir [string] Directory to singularity definition files. Used to parse versions for reporting [default: ${baseDir}/resources]
68
+
--config_file [string] Path to Nextflow config file. Used for parsing arguments to write to results if needed [default: ${baseDir}/nextflow.config]
69
+
--help [boolean, string] Show the help message for all top level parameters. When a parameter is given to `--help`, the full help message of that parameter will be printed.
70
+
--helpFull [boolean] Show the help message for all non-hidden parameters.
71
+
--showHidden [boolean] Show all hidden parameters in the help message. This needs to be used in combination with `--help` or `--helpFull`.
72
+
73
+
resources
74
+
--resource_dir [string] Path to resources directroy where utility files are stored [default: ${baseDir}/resources]
75
+
--refseq [string] Path to NCBI refseq summary file [default: ${baseDir}/resources/assembly_summary_refseq.txt]
76
+
77
+
resistance
78
+
--resistance_profiler [string] Tool used for tb-profiler. Either tb-profiler or tbtamr (accepted: tb-profiler, tbtamr) [default: tb-profiler]
79
+
--collate [boolean] Flag. If True resistance reports will be summarised
80
+
81
+
bowtie
82
+
--bowtie_index [string] Bowtie index directory [default: ${baseDir}/bowtie2/]
83
+
--bowtie_index_name [string] Prefix for the Bowtie2 index (minus the file extensions). [default: hg19_1kgmaj]
84
+
85
+
afanc
86
+
--afanc_percent_threshold [number] Minimum percentage threshold for reads in order for a taxa to be considered in Afanc if the pipeline has failed earlier on (for reporting) [default: 5]
87
+
--afanc_n_reads_threshold [integer] Minimum reads threshold for reads in order for a taxa to be considered in Afanc [default: 500]
88
+
--afanc_fail_percent_threshold [number] Minimum percentage threshold for reads in order for a taxa to be considered in Afanc [default: 2]
89
+
--afanc_fail_n_reads_threshold [integer] Minimum reads threshold for reads in order for a taxa to be considered in Afanc if the pipeline has failed earlier on (for reporting) [default: 200]
90
+
91
+
kraken
92
+
--kraken_percent_threshold [number] Percentage threshold of reads required for taxa to be included in Kraken reports [default: 10]
93
+
--kraken_n_reads_threshold [integer] Raw reads threshold required for taxa to be included in Kraken reports [default: 10000]
if ((supposed_species!='null') & (supposed_speciesnotinspecies)):
61
61
sys.exit('ERROR: if you provide a species ID, it must be one of either: abscessus|africanum|avium|bovis|chelonae|chimaera|fortuitum|intracellulare|kansasii|tuberculosis')
62
62
63
-
if ((unmix_myco!='yes') & (unmix_myco!='no')):
64
-
sys.exit('ERROR: \'unmix myco\' should be either \'yes\' or \'no\'')
63
+
if ((unmix_myco!='true') & (unmix_myco!='false')):
64
+
sys.exit('ERROR: \'unmix myco\' should be either \'true\' or \'false\'')
65
65
66
-
if ((permissive!='yes') & (permissive!='no')):
67
-
sys.exit('ERROR: \'permissive\' should be either \'yes\' or \'no\'')
66
+
if ((permissive!='true') & (permissive!='false')):
67
+
sys.exit('ERROR: \'permissive\' should be either \'true\' or \'false\'')
# WHAT IS LIKELY TO HAVE HAPPENED IS THAT THE ALIGNMENT-BASED DECONTAMINATION PROCESS HAS TRIED TO DISAMBIGUATE A MIXTURE OF VERY SIMILAR MYCOBACTERIA AND INADVERTENTLY REMOVED TOO MANY READS. THERE WILL BE NOTHING SUBSTANTIVE LEFT FOR AFANC TO CLASSIFY.
447
447
if ((num_afanc_species==afanc_finds_nothing) & (num_afanc_species==1)):
warnings.append("warning: regardless of what Kraken reports, afanc did not make a species-level mycobacterial classification. If this is a mixed-mycobacterial sample, then an alignment-based contaminant-removal process may not be appropriate. Suggestion: re-run with --unmix_myco 'no'")
449
+
warnings.append("warning: regardless of what Kraken reports, afanc did not make a species-level mycobacterial classification. If this is a mixed-mycobacterial sample, then an alignment-based contaminant-removal process may not be appropriate. Suggestion: re-run with --unmix_myco 'false'")
description+="A 'reference genome' is a manually-selected community standard for that species. Note that some prokaryotes can have more than one reference genome\n"
484
484
description+="[species] refers to what you believe this sample to be. You will be warned if this differs from the Kraken/afanc predictions\n"
485
485
description+="By defining [species] you will automatically select this to be the genome against which reads will be aligned using Clockwork\n"
486
-
description+="[unmix myco] is either 'yes' or 'no', given in response to the question: do you want to disambiguate mixed-mycobacterial samples by read alignment?\n"
487
-
description+="If 'no', any contaminating mycobacteria will be recorded but NOT acted upon\n"
486
+
description+="[unmix myco] is either 'true' or 'false', given in response to the question: do you want to disambiguate mixed-mycobacterial samples by read alignment?\n"
487
+
description+="If 'false', any contaminating mycobacteria will be recorded but NOT acted upon\n"
488
488
usage="python identify_tophit_and_contaminants2.py [path to afanc JSON] [path to Kraken JSON] [path to RefSeq assembly summary file] [species] [unmix myco] [directory containing mycobacterial reference genomes] [aws_config]\n"
parser.add_argument('afanc_json', metavar='afanc_json', type=str, help='Path to afanc json report')
493
493
parser.add_argument('kraken_json', metavar='kraken_json', type=str, help='Path to Kraken json report')
494
494
parser.add_argument('assembly_file', metavar='assembly_file', type=str, help='Path to RefSeq assembly summary file')
495
495
parser.add_argument('species', metavar='species', type=str, help='Refers to what you believe this sample to be. You will be warned if this differs from the Kraken/afanc predictions')
496
-
parser.add_argument('unmix_myco', metavar='unmix_myco', type=str, help='Is either \'yes\' or \'no\', given in response to the question: do you want to disambiguate mixed-mycobacterial samples by read alignment?\nIf \'no\', any contaminating mycobacteria will be recorded but NOT acted upon')
496
+
parser.add_argument('unmix_myco', metavar='unmix_myco', type=str, help='Is either \'true\' or \'false\', given in response to the question: do you want to disambiguate mixed-mycobacterial samples by read alignment?\nIf \'false\', any contaminating mycobacteria will be recorded but NOT acted upon')
497
497
parser.add_argument('myco_dir', metavar='myco_dir', type=str, help='Path to myco directory')
498
498
parser.add_argument('prev_species_json', metavar='prev_species_json', type=str, help='Path to previous species json file. Can be set to \'null\'')
499
-
parser.add_argument('permissive', metavar='permissive', type=str, help="Is either \'yes\' or \'no\', given in response to the question: do you want to carry on to Clockwork regardless of errors?")
499
+
parser.add_argument('permissive', metavar='permissive', type=str, help="Is either \'true\' or \'false\', given in response to the question: do you want to carry on to Clockwork regardless of errors?")
500
500
parser.add_argument('pass_number', metavar='pass_number', type=int, help="Pass number. Refers to what pass of decontamination the pipeline is on")
0 commit comments