You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running wf-somatic-variation on tumor only mod and have encountered an error during the analysis process that seems related to the quality of my BAM files. Specifically, when running the workflow, it appears that the input BAM files are not properly processed, and there may be missing data on somatic variants (SNVs) and modified sites, which is preventing the workflow from completing successfully.
My Current Workflow:
Run my samples through ONT's PromethION and basecall using Dorado on MinKnow.
Take the generated fastq file and run through wf-single-cell, aligning with minimap2
Take the generated BAM files from wf-single-cell and process through wf-somatic-variation.
What I've Done So Far:
To address the mapping quality issue, I tried filtering the BAM files to exclude low-quality reads (MAPQ < 30) using samtools view -q 30 -b, and generated a new filtered BAM file (filtered_reads.bam). However, I'm also concerned that the BAM files may be lacking sufficient data on SNVs or other genomic modifications, which could be contributing to the issue. Would this require re-basecalling?
Questions:
Could low mapping quality (MAPQ < 30) be the primary cause of the error, and would filtering these reads resolve the issue for wf-somatic-variation?
Are there specific steps I can take to ensure that my BAM files contain sufficient data on somatic SNVs and genomic modifications to be processed effectively by the workflow? How should
What would you recommend as the best approach for ensuring that my BAM files are properly processed by the workflow? Are there additional filtering or pre-processing steps needed to enhance data quality for SNVs and modifications I may be missing for my BAM files ahead of wf-somatic-variation/?
Is there a recommended threshold for MAPQ values when working with long-read sequencing data for workflows like wf-somatic-variation?
Relevant log output
Error executing process >'snv_to:clairs_to_extract_candidates (26)'
Caused by:
Process `snv_to:clairs_to_extract_candidates (26)` terminated with an error exit status (1)
Command executed:
# Create output folder structurefordir_namein candidates indels hybrid;doif [ -e$dir_name ];then
rm -r dir_name
fi
mkdir -p $dir_namedone# Create candidates
pypy3 ${CLAIRS_PATH}/clairs_to.py extract_candidates_calling \
--tumor_bam_fn reads.bam \
--ref_fn genome.fa \
--samtools samtools \
--snv_min_af 0.05 \
--indel_min_af 0.05 \
--chunk_id 26 \
--chunk_num 50 \
--ctg_name chr1 \
--platform ont \
--min_coverage 4 \
--min_bq 20 \
--bed_fn split_beds/chr1 \
--call_indels_only_in_these_regions split_indel_beds/chr1 \
--candidates_folder candidates/ \
--output_depth True \
--select_indel_candidates True \
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Traceback (most recent call last):
File "/home/epi2melabs/ClairS-TO/clairs_to.py", line 109, in<module>main()
File "/home/epi2melabs/ClairS-TO/clairs_to.py", line 103, in main
submodule.main()
File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 615, in main
extract_pair_candidates(args)
File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 344, in extract_pair_candidates
select_indel_candidates=select_indel_candidates
File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 91, in decode_pileup_bases
base_list[-1][1] = base + pileup_bases[base_idx: base_idx + advance] # add indel seq
IndexError: list index out of range
Application activity log entry
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
The text was updated successfully, but these errors were encountered:
JakeOsc
changed the title
Troubleshooting wf-somatic-variation: Addressing BAM File Quality and Missing SNV Data in Tumor-Only Mode
BAM File Quality and Missing SNV Data in Tumor-Only Mode
Feb 5, 2025
Operating System
Ubuntu 22.04
Other Linux
No response
Workflow Version
v1.4.0
Workflow Execution
Command line (Cluster)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
Run the Nextflow pipeline with the specified parameters
nextflow run wf-somatic-variation
-profile singularity
--sample_name "$SAMPLE_NAME"
--snv
--ref "$REFERENCE_GENOME"
--bam_tumor "$BAM_TUMOR"
--override_basecaller_cfg "$BASECALLER_CFG"
--tumor_min_coverage 0
--outdir "$OUTDIR"
--expected_cells 2235
-w "$WORK_DIR"
Workflow Execution - CLI Execution Profile
None
What happened?
I am running wf-somatic-variation on tumor only mod and have encountered an error during the analysis process that seems related to the quality of my BAM files. Specifically, when running the workflow, it appears that the input BAM files are not properly processed, and there may be missing data on somatic variants (SNVs) and modified sites, which is preventing the workflow from completing successfully.
My Current Workflow:
What I've Done So Far:
To address the mapping quality issue, I tried filtering the BAM files to exclude low-quality reads (MAPQ < 30) using samtools view -q 30 -b, and generated a new filtered BAM file (filtered_reads.bam). However, I'm also concerned that the BAM files may be lacking sufficient data on SNVs or other genomic modifications, which could be contributing to the issue. Would this require re-basecalling?
Questions:
Relevant log output
Application activity log entry
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
The text was updated successfully, but these errors were encountered: