Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error executing process > 'ingress_normal:checkBamHeaders (1)' #26

Closed
SilviaMariaMacri opened this issue Jun 14, 2024 · 10 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@SilviaMariaMacri
Copy link

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux release 8.6

Workflow Version

v.1.2.1

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

/hpcshare/genomics/ASL_ONC/NextFlow_RunningDir/nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor 'pbspro' -process.memory 256.GB -work-dir '/archive/s2/genomics/onco_nanopore/test_som_var/work' -with-timeline --snv --sv --mod --sample_name 'OHU0002HI' --bam_normal '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN.bam' --bam_tumor '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN.bam' --ref '/archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta' --out_dir '/archive/s2/genomics/onco_nanopore/test_som_var' --basecaller_cfg '[email protected]' --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0 --haplotype_filter_threads 32 --severus_threads 32 --dss_threads 4 --modkit_threads 32 -process.cpus 32 -process.queue 'fatnodes'

Workflow Execution - CLI Execution Profile

singularity

What happened?

Contrary to what is stated in the output error, the input data were obtained from the same dorado model, but with two types of methylation (every read has the same two types)

Relevant log output

ERROR ~ Error executing process > 'ingress_normal:checkBamHeaders (1)'

Caused by:
  Process `ingress_normal:checkBamHeaders (1)` terminated with an error exit status (65)

Command executed:

  workflow-glue check_bam_headers_in_dir input_dir > env.vars
  source env.vars
  DS_RUNIDS=$(workflow-glue get_ds_records --xam input_dir --key runid --cardinality zero-or-more --sep ',')
  DS_BASECALL_MODELS=$(workflow-glue get_ds_records --xam input_dir --key basecall_model --cardinality zero-or-one --sep ',' --explode_obviously)

Command exit status:
  65

Command output:
  (empty)

Command error:
  [12:29:26 - workflow_glue] Bootstrapping CLI.
  [12:29:26 - workflow_glue] Starting entrypoint.
  [12:29:26 - workflow_glue.checkBamHd] Checked (u)BAM headers in 'input_dir'.
  [12:29:26 - workflow_glue] Bootstrapping CLI.
  [12:29:26 - workflow_glue] Starting entrypoint.
  [12:29:26 - workflow_glue] Bootstrapping CLI.
  [12:29:26 - workflow_glue] Starting entrypoint.

  ################################################################################
  # INPUT DATA PROBLEM
  Your input data contains reads basecalled with more than one basecaller model.

  Our workflows automatically select appropriate configuration and models for
  downstream tools for a given basecaller model. This cannot be done reliably when
  reads with different basecaller models are mixed in the same data set.

  ## Next steps
  To use this workflow you must separate your input files, making sure all reads
  are have been basecalled with the same basecaller model.
  ################################################################################

Work dir:
  /archive/s2/genomics/onco_nanopore/test_som_var/work/98/8067ea1af0d55a5ff0954e000d6501

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

no attempt
@RenzoTale88 RenzoTale88 self-assigned this Jun 14, 2024
@RenzoTale88 RenzoTale88 added the bug Something isn't working label Jun 14, 2024
@cjw85
Copy link

cjw85 commented Jun 14, 2024

@SilviaMariaMacri

Can you provide details of the dorado command that was used for basecalling, and the BAM header containing the read group information?

@SilviaMariaMacri
Copy link
Author

Thank you for the answer.

Here follows the dorado command, after defining ${data}, ${file_name} and ${reference} variables.

model0="[email protected]"
model1="[email protected]_6mA@v2"
model2="[email protected]_5mC_5hmC@v1"

dorado download --model ${model0}
dorado download --model ${model1}
dorado download --model ${model2}

dorado duplex -x cuda:all --reference ${reference} ${model0} ${data} --modified-bases-models ${model1},${model2} > ${file_name}.bam

@SilviaMariaMacri
Copy link
Author

Read group for bam_normal:

@rg ID:ae7f1cdd128c3a26767403f064f672a2c616437b_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS29949 PM:PC24B186 DT:2024-01-29T17:05:56.229+00:00 PL:ONT DS:[email protected][email protected] runid=ae7f1cdd128c3a26767403f064f672a2c616437b LB:OHU0002HBLDN1R SM:OHU0002HBLDN1R
@rg ID:ae7f1cdd128c3a26767403f064f672a2c616437b_dna_r10.4.1_e8.2_400bps_sup@v4.3.0 PU:PAS29949 PM:PC24B186 DT:2024-01-29T17:05:56.229+00:00 PL:ONT DS:[email protected] runid=ae7f1cdd128c3a26767403f064f672a2c616437b LB:OHU0002HBLDN1R SM:OHU0002HBLDN1R
@rg ID:ae7f1cdd128c3a26767403f064f672a2c616437b_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS29949 PM:PC24B186 DT:2024-01-29T17:05:56.229+00:00 PL:ONT DS:[email protected][email protected] runid=ae7f1cdd128c3a26767403f064f672a2c616437b LB:OHU0002HBLDN1R SM:OHU0002HBLDN1R
@rg ID:ae7f1cdd128c3a26767403f064f672a2c616437b_dna_r10.4.1_e8.2_400bps_sup@v4.3.0-4A2522A PU:PAS29949 PM:PC24B186 DT:2024-01-29T17:05:56.229+00:00 PL:ONT DS:[email protected] runid=ae7f1cdd128c3a26767403f064f672a2c616437b LB:OHU0002HBLDN1R SM:OHU0002HBLDN1R
@rg ID:608d7c95efaaf685893c7bc68e874e367b00a9b0_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS29949 PM:PC24B186 DT:2024-01-31T14:13:13.498+00:00 PL:ONT DS:[email protected][email protected] runid=608d7c95efaaf685893c7bc68e874e367b00a9b0 LB:OHU0002HBLDN2R SM:OHU0002HBLDN2R
@rg ID:608d7c95efaaf685893c7bc68e874e367b00a9b0_dna_r10.4.1_e8.2_400bps_sup@v4.3.0 PU:PAS29949 PM:PC24B186 DT:2024-01-31T14:13:13.498+00:00 PL:ONT DS:[email protected] runid=608d7c95efaaf685893c7bc68e874e367b00a9b0 LB:OHU0002HBLDN2R SM:OHU0002HBLDN2R
@

@SilviaMariaMacri
Copy link
Author

Read group for bam_tumor:

@rg ID:4dbda00436146c1afa29be6498bddcccb1c878da_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-01-29T17:07:06.439+00:00 PL:ONT DS:[email protected][email protected] runid=4dbda00436146c1afa29be6498bddcccb1c878da LB:OHU0002IBLDN1R SM:OHU0002IBLDN1R
@rg ID:4dbda00436146c1afa29be6498bddcccb1c878da_dna_r10.4.1_e8.2_400bps_sup@v4.3.0 PU:PAS37394 PM:PC24B186 DT:2024-01-29T17:07:06.439+00:00 PL:ONT DS:[email protected] runid=4dbda00436146c1afa29be6498bddcccb1c878da LB:OHU0002IBLDN1R SM:OHU0002IBLDN1R
@rg ID:4dbda00436146c1afa29be6498bddcccb1c878da_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-01-29T17:07:06.439+00:00 PL:ONT DS:[email protected][email protected] runid=4dbda00436146c1afa29be6498bddcccb1c878da LB:OHU0002IBLDN1R SM:OHU0002IBLDN1R
@rg ID:4dbda00436146c1afa29be6498bddcccb1c878da_dna_r10.4.1_e8.2_400bps_sup@v4.3.0-1FF72A60 PU:PAS37394 PM:PC24B186 DT:2024-01-29T17:07:06.439+00:00 PL:ONT DS:[email protected] runid=4dbda00436146c1afa29be6498bddcccb1c878da LB:OHU0002IBLDN1R SM:OHU0002IBLDN1R
@rg ID:46cb5cab1c0b652e181ee762d0beca2766ac2ab7_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-01-31T14:17:44.277+00:00 PL:ONT DS:[email protected][email protected] runid=46cb5cab1c0b652e181ee762d0beca2766ac2ab7 LB:OHU0002IBLDN2R SM:OHU0002IBLDN2R
@rg ID:46cb5cab1c0b652e181ee762d0beca2766ac2ab7_dna_r10.4.1_e8.2_400bps_sup@v4.3.0 PU:PAS37394 PM:PC24B186 DT:2024-01-31T14:17:44.277+00:00 PL:ONT DS:[email protected] runid=46cb5cab1c0b652e181ee762d0beca2766ac2ab7 LB:OHU0002IBLDN2R SM:OHU0002IBLDN2R
@rg ID:46cb5cab1c0b652e181ee762d0beca2766ac2ab7_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-01-31T14:17:44.277+00:00 PL:ONT DS:[email protected][email protected] runid=46cb5cab1c0b652e181ee762d0beca2766ac2ab7 LB:OHU0002IBLDN2R SM:OHU0002IBLDN2R
@rg ID:46cb5cab1c0b652e181ee762d0beca2766ac2ab7_dna_r10.4.1_e8.2_400bps_sup@v4.3.0-990DC92 PU:PAS37394 PM:PC24B186 DT:2024-01-31T14:17:44.277+00:00 PL:ONT DS:[email protected] runid=46cb5cab1c0b652e181ee762d0beca2766ac2ab7 LB:OHU0002IBLDN2R SM:OHU0002IBLDN2R
@rg ID:8c5a967f77c502748a2687195acd11b04af9bace_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-02-02T14:13:57.402+00:00 PL:ONT DS:[email protected][email protected] runid=8c5a967f77c502748a2687195acd11b04af9bace LB:OHU0002IBLDN3R SM:OHU0002IBLDN3R
@rg ID:8c5a967f77c502748a2687195acd11b04af9bace_dna_r10.4.1_e8.2_400bps_sup@v4.3.0 PU:PAS37394 PM:PC24B186 DT:2024-02-02T14:13:57.402+00:00 PL:ONT DS:[email protected] runid=8c5a967f77c502748a2687195acd11b04af9bace LB:OHU0002IBLDN3R SM:OHU0002IBLDN3R

@epi2me-labs epi2me-labs deleted a comment from RenzoTale88 Jun 14, 2024
@cjw85
Copy link

cjw85 commented Jun 14, 2024

Thanks for that.

So I can see that you've used duplex calling, which results in:

  1. some reads (the duplex-called pairs) having their basecalling model listed as the ...@[email protected] model and not the [email protected] model
  2. simplex and duplex basecalls being intermixed in your BAM files.

No downstream inference models support a mix of simplex and duplex data. I'm not even sure without checking that any currently support duplex full stop. We can make the warning in the workflow more explicit, and potentially work toward automatically filtering the input to simplex reads.

I the meantime you will have to filter your BAMs to remove the duplex basecalls. This can be done by filtering reads based on the dx:i: auxiliarly tag of reads. See dorado docs with samtools.

@SilviaMariaMacri
Copy link
Author

Thank you. I'll try with the filtered BAM and I'll let you know!

@SilviaMariaMacri
Copy link
Author

Hi @cjw85,

I filtered the bam file with samtools and modified the header. The pipeline now works without errors until the last process (mod:dss, but I opened an other issue #27 since the new error is not related to this one).

Then I tried to run again the pipeline by keeping the original bam file (with simplex and duplex reads) after manually modifying the header of the bam file. More precisely, I put the same model ("DS:[email protected]") in the DS column of every @rg row. Now the header doesn't reflect the content of the file but the pipeline works exactly like the previous case with the bam file containing only simplex reads.

So my question is: how the presence of duplex and simplex reads in the bam file negatively affects the analysis, since the pipeline seems to work without problems and to give correct files as output?

@RenzoTale88
Copy link
Contributor

@SilviaMariaMacri generally speaking, we do not have a model that has been trained on duplex reads. Also, did you generated you duplex+simplex BAM by simply concatenating the output BAMs from wf-basecalling? If so, I'd check whether you have any reads with the tag dx:i:-1 in it. These should be dropped, as they are the simplex reads that form duplex reads, that are therefore redundant.

@SilviaMariaMacri
Copy link
Author

@RenzoTale88 thanks for your answer.
I didn't use wf-basecalling but directly Dorado, so it gave me only one file as output, with all read types (dx:i:1, dx:i:0, dx:i:-1)
Before running the pipeline, I filtered the bam file by keeping only dx:i:1 and dx:i:0 tags (this in the second pipeline run, before I kept only dx:i:0 and dx:i:-1 as suggested by @cjw85).
In conclusion, at the moment the pipeline ouput is more reliable if applied to only simplex data, is that right?

@RenzoTale88
Copy link
Contributor

@SilviaMariaMacri given that the models are trained on the simplex data, I'd suggest to stick to these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants