Description
Hi @jguhlin,
Thanks for all of the recent changes to minimap2-rs. The new builder is much harder to misuse, and with the new index handling, I'm able to get rid of the hacks I had before when using the index from many threads.
However, one issue I've noticed is that while producing identical output to the previous version (when modified to account for the read name for tie breaking), the memory usage is considerably higher, specifically as more threads are used. For example, in oarfish, when mapping about 5M nanopore reads to the human transcriptome, the previous version used just north of 5G of RAM with 32 threads (accounting for all other data structures in the program as well), while with the new minimap2-rs version it uses over 10G of RAM. Given the temporary allocation of multiple ARCs wrapping the read name, I'd expect a small increase but nowhere near this much. This seems much more related to either (1) a much larger person thread buffer (2) a subtle memory leak, as the usage seems to increase slowly but steadily over time or (3) a combination of both of these.
I've tracked the issue down to minimap2-rs, as when I remove the actual calls to map, the memory usage is identical between versions, so the culprit must be something happening in, or triggered by, the map function. I'm happy to help investigate, but wanted to see if you had any immediate ideas.
Note: The big difference seems to occur in the newest version, but is not present in 1.21. It is also present in 1.22. So, whatever is causing the difference happened between 1.21 and 1.22.
Thanks!
Rob
Cc @zzare-umd