-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error executing process > 'ingress_normal:checkBamHeaders (1)' #26
Comments
Can you provide details of the dorado command that was used for basecalling, and the BAM header containing the read group information? |
Thank you for the answer. Here follows the dorado command, after defining ${data}, ${file_name} and ${reference} variables. model0="[email protected]" dorado download --model ${model0} dorado duplex -x cuda:all --reference ${reference} ${model0} ${data} --modified-bases-models ${model1},${model2} > ${file_name}.bam |
Read group for bam_normal: @rg ID:ae7f1cdd128c3a26767403f064f672a2c616437b_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS29949 PM:PC24B186 DT:2024-01-29T17:05:56.229+00:00 PL:ONT DS:[email protected][email protected] runid=ae7f1cdd128c3a26767403f064f672a2c616437b LB:OHU0002HBLDN1R SM:OHU0002HBLDN1R |
Read group for bam_tumor: @rg ID:4dbda00436146c1afa29be6498bddcccb1c878da_dna_r10.4.1_e8.2_400bps_sup@[email protected] PU:PAS37394 PM:PC24B186 DT:2024-01-29T17:07:06.439+00:00 PL:ONT DS:[email protected][email protected] runid=4dbda00436146c1afa29be6498bddcccb1c878da LB:OHU0002IBLDN1R SM:OHU0002IBLDN1R |
Thanks for that. So I can see that you've used duplex calling, which results in:
No downstream inference models support a mix of simplex and duplex data. I'm not even sure without checking that any currently support duplex full stop. We can make the warning in the workflow more explicit, and potentially work toward automatically filtering the input to simplex reads. I the meantime you will have to filter your BAMs to remove the duplex basecalls. This can be done by filtering reads based on the |
Thank you. I'll try with the filtered BAM and I'll let you know! |
Hi @cjw85, I filtered the bam file with samtools and modified the header. The pipeline now works without errors until the last process (mod:dss, but I opened an other issue #27 since the new error is not related to this one). Then I tried to run again the pipeline by keeping the original bam file (with simplex and duplex reads) after manually modifying the header of the bam file. More precisely, I put the same model ("DS:[email protected]") in the DS column of every @rg row. Now the header doesn't reflect the content of the file but the pipeline works exactly like the previous case with the bam file containing only simplex reads. So my question is: how the presence of duplex and simplex reads in the bam file negatively affects the analysis, since the pipeline seems to work without problems and to give correct files as output? |
@SilviaMariaMacri generally speaking, we do not have a model that has been trained on duplex reads. Also, did you generated you duplex+simplex BAM by simply concatenating the output BAMs from wf-basecalling? If so, I'd check whether you have any reads with the tag |
@RenzoTale88 thanks for your answer. |
@SilviaMariaMacri given that the models are trained on the simplex data, I'd suggest to stick to these. |
Operating System
Other Linux (please specify below)
Other Linux
Red Hat Enterprise Linux release 8.6
Workflow Version
v.1.2.1
Workflow Execution
Command line (Cluster)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
/hpcshare/genomics/ASL_ONC/NextFlow_RunningDir/nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor 'pbspro' -process.memory 256.GB -work-dir '/archive/s2/genomics/onco_nanopore/test_som_var/work' -with-timeline --snv --sv --mod --sample_name 'OHU0002HI' --bam_normal '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN.bam' --bam_tumor '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN.bam' --ref '/archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta' --out_dir '/archive/s2/genomics/onco_nanopore/test_som_var' --basecaller_cfg '[email protected]' --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0 --haplotype_filter_threads 32 --severus_threads 32 --dss_threads 4 --modkit_threads 32 -process.cpus 32 -process.queue 'fatnodes'
Workflow Execution - CLI Execution Profile
singularity
What happened?
Contrary to what is stated in the output error, the input data were obtained from the same dorado model, but with two types of methylation (every read has the same two types)
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
no
Other demo data information
The text was updated successfully, but these errors were encountered: