-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi again,
I am running MiCall version v7.17.1 through docker, the command I used:
micall folder --project_code HIVB --skip trim.censor -s
The sequencing experiment was hybridisation using the TWIST HIV library run on a P1 NextSeq paired end 150bp. The data generated for most of the samples was heaps (something like 5 million paired end reads on average, unprocessed). These samples with reasonable read numbers finished within hours.
But two samples were nearly triple the number of reads (15 million) are still being processed ~ 4 days later.
The machine I am running MiCall on has plenty of ram (2TB) and cpu (256).
The output (just one sample):
2025-04-24 07:13:42,554[INFO]micall.link_samples(): Pairing files /data/20_XXXXXX_S20_R1_001.fastq.gz and /data/20_XXXXXX_S20_R2_001.fastq.gz.
2025-04-24 07:13:43,460[INFO]micall.drivers.sample.process(): Processing Sample 20_XXXXXX (12 of 24) ('/data/20_XXXXXX_S20_R1_001.fastq.gz').
2025-04-24 08:35:44,843[INFO]micall.drivers.sample.process(): Running fastq_g2p on Sample 20_XXXXXX (12 of 24).
2025-04-24 11:41:52,503[INFO]micall.drivers.sample.run_mapping(): Running prelim_map on Sample 20_XXXXXX (12 of 24).
2025-04-24 19:49:56,932[INFO]micall.drivers.sample.run_mapping(): Running remap on Sample 20_XXXXXX (12 of 24).
2025-04-25 15:48:24,382[INFO]micall.drivers.sample.process_post_assembly(): Running sam2aln on Sample 20_XXXXXX (12 of 24).
2025-04-25 16:47:30,930[INFO]micall.drivers.sample.process_post_assembly(): Running aln2counts on Sample 20_XXXXXX (12 of 24).
To be clear, the pipeline hasn't crashed, it is still processing the data. In the output directory, there is scratch folder containing tmp processing folders. For the above sample number 20 the folder contains 2674 (and still growing) nuc_read_counts*.csv.
The issue is the large number of input reads but what's interesting is that the input is 'only' 3 times larger than files which took ~4 hours to complete.
I could subsample the reads to a reasonable amount but I think the issue still remains that something is triggering MiCall is go super slow. Possibly related to #214 ?