Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues with flair collapse #432

Open
linley-sherin opened this issue Feb 27, 2025 · 1 comment
Open

Memory issues with flair collapse #432

linley-sherin opened this issue Feb 27, 2025 · 1 comment

Comments

@linley-sherin
Copy link

linley-sherin commented Feb 27, 2025

(flair) bash-4.2$ flair collapse -q ./06_FLAIR/NW_007617753.1.bed
-g ./guppy_genomes/ncbi_female.fa
-r ./03_updatebam/*.fastq
-o ./06_FLAIR/NW_007617753.1
--gtf ./guppy_genomes/ncbi_female_sorted.gtf
--stringent
--check_splice
--generate_map
--annotation_reliant generate
--no_gtf_end_adjustment
--isoformtss
--trust_ends
-t 40

Feel free to leave any original paths, we don't have access to your system

How did you install Flair?
(We'd prefer it if you used one of the top two because they are the least likely to have package compatibility problems.)

  1. bioconda (e.g. conda create -n flair -c conda-forge -c bioconda flair)

What happened?

Writing temporary files to /tmp/tmp6jc2djt6/
Making transcript fasta using annotated gtf and genome sequence
Aligning reads to reference transcripts
Counting supporting reads for annotated transcripts
Traceback (most recent call last):
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/site-packages/flair/count_sam_transcripts.py", line 368, in <module>
    p = Pool(args.t)
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/SciBorg/array0/linley/conda_env/envs/flair/lib/python3.9/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Failed at counting step for annotated isoform read support

We know it's ugly but we promise it helps us solve problems faster.

What else do we need to know?

Hi! I am having a lot of issues with memory limitations trying to run flair collapse. I've split my corrected .bed file into <1GB chunks but I still can't get passed the counting step. I am thinking it's because of the quantity of reads I have (840GB), so I am wondering if splitting my fastq files into chunks with transcript that match the individual bed files chunks would solve this issue or not?

In general I am confused as to why flair collapse is realigning everything as the first step given that my data is already aligned from flair align + correct. Is there no way to skip that step and instead just input a corrected bam file and have flair collapse the isoforms from that? I didn't see this option in the docs but given the size of my data I am wondering if there is some work around.

@linley-sherin
Copy link
Author

Hi! I was able to figure out a solution by splitting the 'corrected.bed' output by 'flair correct' by chromosome and then splitting my reads so that only reads aligning to each respective chromosome are being used in 'flair collapse'.

However this is producing 2 issues:

(1) there are reads that are mapping to other chromosomes during 'flair collapse', which shouldn't be possible given I am only inputting reads that aligned to the focal chromosome during 'flair align'. I thought maybe this is because of the difference in parameters between the two minimap commands?

(2) those reads that are mapping off-chromosome are also being reported repetitively, for example when I run 'flair collapse' for NC_024238.1, the output bed file contains the following lines.

Here you can see that these isoform models are being mapped to another chromosome and are being reported multiple times. Here there are two versions of the same transcript (LOC103458740_XM_008399755.2 and LOC103458740_XM_008399753.2) but both are repeated multiple times.

NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399753.2"; exon_number "0";
NC_024352.1	FLAIR	transcript	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2";
NC_024352.1	FLAIR	exon	16867899	16868159	.	-	.	gene_id "LOC103458740"; transcript_id "XM_008399755.2"; exon_number "0";

This appears to be some sort of bug but I am not actually sure what the origin is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant