Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM File Quality and Missing SNV Data in Tumor-Only Mode #42

Open
JakeOsc opened this issue Feb 5, 2025 · 0 comments
Open

BAM File Quality and Missing SNV Data in Tumor-Only Mode #42

JakeOsc opened this issue Feb 5, 2025 · 0 comments

Comments

@JakeOsc
Copy link

JakeOsc commented Feb 5, 2025

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.4.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

Run the Nextflow pipeline with the specified parameters

nextflow run wf-somatic-variation
-profile singularity
--sample_name "$SAMPLE_NAME"
--snv
--ref "$REFERENCE_GENOME"
--bam_tumor "$BAM_TUMOR"
--override_basecaller_cfg "$BASECALLER_CFG"
--tumor_min_coverage 0
--outdir "$OUTDIR"
--expected_cells 2235
-w "$WORK_DIR"

Workflow Execution - CLI Execution Profile

None

What happened?

I am running wf-somatic-variation on tumor only mod and have encountered an error during the analysis process that seems related to the quality of my BAM files. Specifically, when running the workflow, it appears that the input BAM files are not properly processed, and there may be missing data on somatic variants (SNVs) and modified sites, which is preventing the workflow from completing successfully.

My Current Workflow:

  1. Run my samples through ONT's PromethION and basecall using Dorado on MinKnow.
  2. Take the generated fastq file and run through wf-single-cell, aligning with minimap2
  3. Take the generated BAM files from wf-single-cell and process through wf-somatic-variation.

What I've Done So Far:
To address the mapping quality issue, I tried filtering the BAM files to exclude low-quality reads (MAPQ < 30) using samtools view -q 30 -b, and generated a new filtered BAM file (filtered_reads.bam). However, I'm also concerned that the BAM files may be lacking sufficient data on SNVs or other genomic modifications, which could be contributing to the issue. Would this require re-basecalling?

Questions:

  1. Could low mapping quality (MAPQ < 30) be the primary cause of the error, and would filtering these reads resolve the issue for wf-somatic-variation?
  2. Are there specific steps I can take to ensure that my BAM files contain sufficient data on somatic SNVs and genomic modifications to be processed effectively by the workflow? How should
  3. What would you recommend as the best approach for ensuring that my BAM files are properly processed by the workflow? Are there additional filtering or pre-processing steps needed to enhance data quality for SNVs and modifications I may be missing for my BAM files ahead of wf-somatic-variation/?
  4. Is there a recommended threshold for MAPQ values when working with long-read sequencing data for workflows like wf-somatic-variation?

Relevant log output

Error executing process > 'snv_to:clairs_to_extract_candidates (26)'

Caused by:
  Process `snv_to:clairs_to_extract_candidates (26)` terminated with an error exit status (1)

Command executed:

  # Create output folder structure
  for dir_name in candidates indels hybrid; do
      if [ -e $dir_name ]; then
          rm -r dir_name
      fi
      mkdir -p $dir_name
  done
  # Create candidates
  pypy3 ${CLAIRS_PATH}/clairs_to.py extract_candidates_calling \
      --tumor_bam_fn reads.bam \
      --ref_fn genome.fa \
      --samtools samtools \
      --snv_min_af 0.05 \
      --indel_min_af 0.05 \
      --chunk_id 26 \
      --chunk_num 50 \
      --ctg_name chr1 \
      --platform ont \
      --min_coverage 4 \
      --min_bq 20 \
      --bed_fn split_beds/chr1 \
      --call_indels_only_in_these_regions split_indel_beds/chr1 \
      --candidates_folder candidates/ \
      --output_depth True  \
      --select_indel_candidates True \

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Traceback (most recent call last):
    File "/home/epi2melabs/ClairS-TO/clairs_to.py", line 109, in <module>
      main()
    File "/home/epi2melabs/ClairS-TO/clairs_to.py", line 103, in main
      submodule.main()
    File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 615, in main
      extract_pair_candidates(args)
    File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 344, in extract_pair_candidates
      select_indel_candidates=select_indel_candidates
    File "/home/epi2melabs/ClairS-TO/src/extract_candidates_calling.py", line 91, in decode_pileup_bases
      base_list[-1][1] = base + pileup_bases[base_idx: base_idx + advance]  # add indel seq
  IndexError: list index out of range

Application activity log entry

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

@JakeOsc JakeOsc changed the title Troubleshooting wf-somatic-variation: Addressing BAM File Quality and Missing SNV Data in Tumor-Only Mode BAM File Quality and Missing SNV Data in Tumor-Only Mode Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant