Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement alignment subworkflow #6

Merged
merged 87 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
86220ec
Basic isolated alignment subworkflow outline
scwatts Dec 1, 2023
4e92135
Initial implementation of alignment workflow.
mkcmkc Feb 11, 2024
f69caf9
Simplified condition on whether fastp is run in alignment subworkflow.
mkcmkc Feb 11, 2024
1ef8abd
Get rid of blocking when merging individual sample records back into …
mkcmkc Feb 12, 2024
b2c2ad7
Simple improvement to the alignment subworkflow.
mkcmkc Feb 12, 2024
f377861
Merge alignment and markdups logic into Stephen's stubs.
mkcmkc Feb 13, 2024
0151004
Updgrading from bwa mem to bwa mem2.
mkcmkc Feb 13, 2024
8456695
Fixing read group flag for bwa mem2.
mkcmkc Feb 13, 2024
c3f9050
Reassigning TODO.
mkcmkc Feb 13, 2024
fe49cf0
Emiting versions.
mkcmkc Feb 13, 2024
1d13b56
Updating TODOs.
mkcmkc Feb 13, 2024
8ce78ff
Fixing tags for new processes.
mkcmkc Feb 13, 2024
ee059ad
Setting up targeted and wgts workflows for testing.
mkcmkc Feb 13, 2024
e3f8d45
Minor fixes and style improvements.
mkcmkc Feb 14, 2024
58ed71c
Adding a TODO.
mkcmkc Feb 15, 2024
de77de8
Add has_umis switch to markdups.
mkcmkc Feb 18, 2024
01ad68d
Force symlink overwrite so process does not fail on resume.
mkcmkc Feb 18, 2024
df6e65a
Change name of output bam from markdups.
mkcmkc Feb 18, 2024
a4a95b1
Fix read group arg to bwa mem2.
mkcmkc Feb 19, 2024
61aa1b1
Add TODO.
mkcmkc Feb 19, 2024
a1dde8e
Fix markdups umi flags for TSO500 panel samples.
mkcmkc Feb 19, 2024
8f06a9f
Add TODO.
mkcmkc Feb 19, 2024
a2330b2
Fix read group extraction from fastq filenames and a bug in the markd…
mkcmkc Feb 19, 2024
36c96fc
Running with umis for targeted and without for wgts.
mkcmkc Feb 19, 2024
c8cec75
Fix includes in targeted.nf.
mkcmkc Feb 19, 2024
b45b43a
Only run markdups with UMIs when tso500 panel is selected.
mkcmkc Feb 19, 2024
2479229
Create switch between bwa mem and bwa mem2 for debugging purposes.
mkcmkc Feb 19, 2024
00bf6ee
Add TODO.
mkcmkc Feb 20, 2024
3b83265
Move new params into nextflow.config.
mkcmkc Feb 20, 2024
00cfe50
Add label to markdups process.
mkcmkc Feb 20, 2024
ba241d1
Disable all filtering and poly-g trimming in fastp.
mkcmkc Feb 21, 2024
3c18154
Use fastp by default.
mkcmkc Feb 21, 2024
6b4f805
Remove bwa mem process.
mkcmkc Feb 21, 2024
c132e2b
Remove obsolete TODOs.
mkcmkc Feb 21, 2024
ec992f1
Pass ref genome version to markdups.
mkcmkc Feb 21, 2024
f7d3e5c
Add TODO.
mkcmkc Feb 21, 2024
a01df62
Get versions from CLIs.
mkcmkc Feb 22, 2024
2c298c4
Delete Dockerfiles and update containers to biocontainer equivalents.
mkcmkc Feb 22, 2024
33244d9
Removing temp config.
mkcmkc Feb 22, 2024
8d82bac
Fix process tags.
mkcmkc Feb 22, 2024
be271bb
Reassign TODO.
mkcmkc Feb 22, 2024
1364c8b
Add TODO.
mkcmkc Feb 22, 2024
e35e4f3
Align read group construction with HMF pipeline5.
mkcmkc Feb 23, 2024
aa77b22
Bump MarkDups to 1.1.1
scwatts Feb 23, 2024
1301932
Patch
scwatts Feb 28, 2024
0b27c4a
Fix join for alignment BAM and corresponding BAIs
scwatts Feb 28, 2024
0ca87ec
Improve comments, syntax, etc
scwatts Feb 28, 2024
51e21b5
Implement RNA alignment
scwatts Feb 29, 2024
3220f7b
Move, rename bwa-mem2 module
scwatts Feb 29, 2024
b9e88fe
Fix bwa-mem2 process name
scwatts Feb 29, 2024
781d883
Set STAR process label to 'process_high'
scwatts Feb 29, 2024
063aa3a
Fix incomplete container URL for SAMtools sort
scwatts Feb 29, 2024
914edab
Handle when read count less than max read split
scwatts Feb 29, 2024
2d8c76c
Correctly format RG arg for STAR
scwatts Feb 29, 2024
8749d45
Fix bwa-mem2 bi-index variable name typo
scwatts Feb 29, 2024
de1aad0
Adjust, fix bwa-mem2 index handling
scwatts Feb 29, 2024
7e0fb7f
Remove deprecated MarkDups `-multi_bam` argument
scwatts Feb 29, 2024
f47f33e
Fix AMBER subworkflow TN mode
scwatts Feb 29, 2024
08a58c8
Fix COBALT subworkflow TN mode
scwatts Feb 29, 2024
c53ac44
Adjust indentation
scwatts Feb 29, 2024
df1f5b7
Fix SAGE calling subworkflow TN mode
scwatts Feb 29, 2024
752f57a
Improving handling of 'no merge' RNA BAM scenarios
scwatts Feb 29, 2024
c648028
Set outputs for alignment workflows
scwatts Feb 29, 2024
db6bbf2
Add missing channel docs
scwatts Feb 29, 2024
c1f3142
Further work on RNA BAM handling
scwatts Feb 29, 2024
bd45719
Use explicit returns in .branch ops
scwatts Feb 29, 2024
8d06484
Do not index RNA BAMs prior to merge
scwatts Feb 29, 2024
c7e87c2
Remove obsolete TODOs
mkcmkc Mar 5, 2024
1653daf
Fix Isofox singularity container URL
scwatts Mar 7, 2024
f618283
Bump TSO500 data bundle version
scwatts Mar 7, 2024
2bd379c
Remove -force_pathogenic_pass in PAVE somatic
scwatts Mar 7, 2024
f76f79b
Correct prepare reference panel data path lookup
scwatts Mar 16, 2024
07117c2
Merge branch 'dev' into alignment-subworkflow
scwatts Mar 16, 2024
9b012d0
Fix optional channel placeholders
scwatts Mar 16, 2024
b90c572
Update modules.json
scwatts Mar 16, 2024
77b8bce
Adjust indenting
scwatts Mar 16, 2024
bd6b304
Use standard container directive format for STAR
scwatts Mar 18, 2024
c7f1774
Add missing imports and subworkflow descriptions
scwatts Mar 18, 2024
177bf69
Use Bioconda/BioContainers for bwa-mem2 module
scwatts Mar 19, 2024
9d61690
Improve naming for bwa-mem2 output BAMs
scwatts Mar 27, 2024
066c8d2
Use BAM index created during alignment
scwatts Apr 15, 2024
0abc35f
Improve BAM index selection
scwatts Apr 22, 2024
eb95b4f
Include BAI in bwa-mem2/align stub
scwatts Apr 22, 2024
4039507
Adjust input selection logic
scwatts Apr 22, 2024
e8f6c99
Bump MarkDups to 1.1.5
scwatts Apr 23, 2024
ed8e1d1
Remove Sambamba index module file
scwatts Apr 23, 2024
e0da20a
Add new meta.yaml
scwatts Apr 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conf/hmf_data.config
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ params {
known_fusions = 'dna_pipeline/sv/known_fusions.37.bedpe'
purple_germline_del = 'dna_pipeline/copy_number/cohort_germline_del_freq.37.csv'
segment_mappability = 'dna_pipeline/variants/mappability_150.37.bed.gz'
unmap_regions = 'dna_pipeline/common/unmap_regions.37.tsv'
}
'38' {
// AMBER
Expand Down Expand Up @@ -101,6 +102,7 @@ params {
known_fusions = 'dna_pipeline/sv/known_fusions.38.bedpe'
purple_germline_del = 'dna_pipeline/copy_number/cohort_germline_del_freq.38.csv'
segment_mappability = 'dna_pipeline/variants/mappability_150.38.bed.gz'
unmap_regions = 'dna_pipeline/common/unmap_regions.38.tsv'
}
}
}
6 changes: 6 additions & 0 deletions conf/hmf_genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,22 @@ params {
fai = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.fai"
dict = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/samtools_index/1.16/Homo_sapiens.GRCh37.GATK.illumina.fasta.dict"
bwa_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/bwa_index/0.7.17-r1188.tar.gz"
bwa_index_bseq = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123"
bwa_index_biidx = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/bwa_index/2.2.1/Homo_sapiens.GRCh37.GATK.illumina.fasta.bwt.2bit.64"
bwa_index_image = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/bwa_index_image/0.7.17-r1188/Homo_sapiens.GRCh37.GATK.illumina.fasta.img"
gridss_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/gridss_index/2.13.2/Homo_sapiens.GRCh37.GATK.illumina.fasta.gridsscache"
star_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh37_hmf/star_index/gencode_19/2.7.3a.tar.gz"
}
'GRCh38_hmf' {
fasta = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
fai = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai"
dict = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/samtools_index/1.16/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.dict"
bwa_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/bwa_index/0.7.17-r1188.tar.gz"
bwa_index_bseq = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.0123"
bwa_index_biidx = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/bwa_index/2.2.1/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64"
bwa_index_image = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/bwa_index_image/0.7.17-r1188/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img"
gridss_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/gridss_index/2.13.2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache"
star_index = "https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/genomes/GRCh38_hmf/star_index/gencode_38/2.7.3a.tar.gz"
}
}
}
16 changes: 16 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,22 @@

process {

withName: 'GATK4_MARKDUPLICATES' {
publishDir = [
path: { "${params.outdir}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : "${meta.key}/alignments/rna/${filename}" },
]
}

withName: 'MARKDUPS' {
publishDir = [
path: { "${params.outdir}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : "${meta.key}/alignments/dna/${filename}" },
]
}

withName: 'AMBER' {
publishDir = [
path: { "${params.outdir}" },
Expand Down
66 changes: 62 additions & 4 deletions lib/Constants.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ class Constants {
static List PANELS_DEFINED = ['tso500']


static String HMF_DATA_37_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/hmftools/5.34_37--0.tar.gz'
static String HMF_DATA_38_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/hmftools/5.34_38--0.tar.gz'
static String HMF_DATA_37_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/hmftools/5.34_37--2.tar.gz'
static String HMF_DATA_38_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/hmftools/5.34_38--2.tar.gz'


static String TSO500_PANEL_37_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/panels/tso500_5.34_37--0.tar.gz'
static String TSO500_PANEL_38_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/panels/tso500_5.34_38--0.tar.gz'
static String TSO500_PANEL_37_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/panels/tso500_5.34_37--1.tar.gz'
static String TSO500_PANEL_38_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/hmf_reference_data/panels/tso500_5.34_38--1.tar.gz'


static String VIRUSBREAKENDDB_PATH = 'https://pub-29f2e5b2b7384811bdbbcba44f8b5083.r2.dev/virusbreakend/virusbreakenddb_20210401.tar.gz'
Expand All @@ -34,6 +34,7 @@ class Constants {
}

static enum Process {
ALIGNMENT,
AMBER,
BAMTOOLS,
CHORD,
Expand All @@ -45,6 +46,7 @@ class Constants {
ISOFOX,
LILAC,
LINX,
MARKDUPS,
ORANGE,
PAVE,
PURPLE,
Expand All @@ -56,7 +58,9 @@ class Constants {
static enum FileType {
// Generic
BAM,
BAM_MARKDUPS,
BAI,
FASTQ,
// Process
AMBER_DIR,
BAMTOOLS,
Expand Down Expand Up @@ -97,11 +101,65 @@ class Constants {
DNA_RNA,
}

static enum InfoField {
CANCER_TYPE,
LANE,
LIBRARY_ID,
}

static Map PLACEHOLDER_META = [meta_placeholder: null]
static List PLACEHOLDER_OPTIONAL_CHANNEL = []

static Map INPUT = [

BAM_DNA_TUMOR: [
FileType.BAM,
SampleType.TUMOR,
SequenceType.DNA,
],

BAM_MARKDUPS_DNA_TUMOR: [
FileType.BAM_MARKDUPS,
SampleType.TUMOR,
SequenceType.DNA,
],

BAM_DNA_NORMAL: [
FileType.BAM,
SampleType.NORMAL,
SequenceType.DNA,
],

BAM_MARKDUPS_DNA_NORMAL: [
FileType.BAM_MARKDUPS,
SampleType.NORMAL,
SequenceType.DNA,
],

BAM_RNA_TUMOR: [
FileType.BAM,
SampleType.TUMOR,
SequenceType.RNA,
],

BAI_DNA_TUMOR: [
FileType.BAI,
SampleType.TUMOR,
SequenceType.DNA,
],

BAI_DNA_NORMAL: [
FileType.BAI,
SampleType.NORMAL,
SequenceType.DNA,
],

BAI_RNA_TUMOR: [
FileType.BAI,
SampleType.TUMOR,
SequenceType.RNA,
],

ISOFOX_DIR: [
FileType.ISOFOX_DIR,
SampleType.TUMOR,
Expand Down
Loading
Loading