-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large increase in memory usage in latest version #120
Comments
Just a small update here, in addition to localizing this increase to the |
Hi @jguhlin, Just another update. I seem to have found the relevant change that causes the large increase in memory usage, but I don't yet understand why. In v1.21 and earlier, you have this definition for the local buffer /// ThreadLocalBuffer for minimap2 memory management
#[derive(Debug)]
struct ThreadLocalBuffer {
buf: *mut mm_tbuf_t,
max_uses: usize,
uses: usize,
}
impl ThreadLocalBuffer {
pub fn new() -> Self {
let buf = unsafe { mm_tbuf_init() };
Self {
buf,
max_uses: 15,
uses: 0,
}
}
/// Return the buffer, checking how many times it has been borrowed.
/// Free the memory of the old buffer and reinitialise a new one If
/// num_uses exceeds max_uses.
pub fn get_buf(&mut self) -> *mut mm_tbuf_t {
if self.uses > self.max_uses {
// println!("renewing threadbuffer");
self.free_buffer();
let buf = unsafe { mm_tbuf_init() };
self.buf = buf;
self.uses = 0;
}
self.uses += 1;
self.buf
}
fn free_buffer(&mut self) {
unsafe { mm_tbuf_destroy(self.buf) };
}
}
/// Handle destruction of thread local buffer properly.
impl Drop for ThreadLocalBuffer {
fn drop(&mut self) {
unsafe { mm_tbuf_destroy(self.buf) };
//self.free_buffer();
}
}
impl Default for ThreadLocalBuffer {
fn default() -> Self {
Self::new()
}
} but, in the newer version, you stopped this (hack?) of refreshing the buffer every 15 uses, and the defintion looks like: /// ThreadLocalBuffer for minimap2 memory management
#[derive(Debug)]
struct ThreadLocalBuffer {
buf: *mut mm_tbuf_t,
// max_uses: usize,
// uses: usize,
}
impl ThreadLocalBuffer {
pub fn new() -> Self {
let buf = unsafe { mm_tbuf_init() };
Self {
buf,
// max_uses: 15,
// uses: 0,
}
}
/// Return the buffer, checking how many times it has been borrowed.
/// Free the memory of the old buffer and reinitialise a new one If
/// num_uses exceeds max_uses.
pub fn get_buf(&mut self) -> *mut mm_tbuf_t {
/* if self.uses > self.max_uses {
// println!("renewing threadbuffer");
self.free_buffer();
let buf = unsafe { mm_tbuf_init() };
self.buf = buf;
self.uses = 0;
}
self.uses += 1; */
self.buf
}
fn free_buffer(&mut self) {
unsafe { mm_tbuf_destroy(self.buf) };
}
}
/// Handle destruction of thread local buffer properly.
impl Drop for ThreadLocalBuffer {
fn drop(&mut self) {
// unsafe { mm_tbuf_destroy(self.buf) };
self.free_buffer();
}
}
impl Default for ThreadLocalBuffer {
fn default() -> Self {
Self::new()
}
} This new version leads to substantially higher memory usage than the previous version. In fact, if I just revert this implementation of |
Hey! First, sorry for the delay. My move has gone terribly and will likely result in a court case. So still recovering from that and just put my desk together an hour ago. Second, thanks again for digging into it. I have no clue where that hack came from, so I removed it recently. I can't imagine I thought of it, as it came in during my very early FFI days, but I can't find where I would have copied it from either. I've even looked at the mappy code in minimap2 from a few years ago and it isn't there. That was my original inspiration for this. So, my memory has failed me. We can add the hack back in, but since minimap2 doesn't need it, I wonder what the difference is that's causing problems here.... |
Maybe it's related to this? https://github.com/lh3/minimap2/blob/618d33515e5853c4576d5a3d126fdcda28f0e8a4/map.c#L367 which isn't represented in the rust code, and the 15 must have been the hack. I'll look into getting that functionality over, as it looks more mature (too much memory? reinit, rather than just count to 15). |
Hi @jguhlin, Sorry to hear about your move! Things have been going absolutely horribly here in the US, so digging into oarfish and minimap2-rs is a beautiful diversion. This is a great find, and logically seems it would be closely related. My only immediate question would be, since we’re passing the buffer into mm_map, which calls mm_map_frag, presumably minimap2 itself would be triggering this reset anyway? This seems like a promising place to look, though. Let me know what you find and I’ll take a peek as well. Thanks! |
I've noticed. May not visit for a few years, though families probably won't let that happen. That's true, and makes sense. It should be triggering from minimap2. I'll plan to revert the hack early next week, and then work on a real fix. It could be something else clearing up free's in the result processing. Something is missing yet.... There's no way I can pass a bumpalo arena to it, is there? Haha. |
Hi @jguhlin, I don't think there's a way to pass a bumpalo, given mm2 uses On the plus side, I gained some insight. Up through v1.21, you had "the hack" in place. This effectively managed the total memory usage as the threads grew large because, even though a buffer refresh wasn't triggered based on size, since it was triggered every 15 calls to On the other hand, the default value for To address this, I set the let per_thread_cap_kalloc = (1_000_000_000_f64 / (args.threads as f64)).ceil() as i64;
aligner.mapopt.cap_kalloc = per_thread_cap_kalloc; This ensures that the total buffer size is such that if all buffers are full simultaneously, we only use ~1G of extra RAM. Obviously, this is adjustable, and the optimal policy may be different (e.g. you may want to allow >1G if you have a huge number of threads like 128 etc.). Nonetheless, this effectively addresses the memory usage I am seeing. So, I think it's not necessary to revert to the hack, but rather to note this behavior in the documentation somewhere so that users know they may have to change the default if they are using many threads. I wanted to raise one other question, but if you'd prefer me to move this to another issue so you can close this, please let me know. I noticed that in the CS and CIGAR string generation, for each alignment there is the allocation of a new
That Thanks! |
let per_thread_cap_kalloc = (1_000_000_000_f64 / (args.threads as f64)).ceil() as i64;
aligner.mapopt.cap_kalloc = per_thread_cap_kalloc; Awesome. I'll make a note of it, and likely provide a function to set it. Great find! For the cigar strings, there was a weird segfault and this was the only way I found to fix it. But it's definitely a hack. I'll try to dig into it again later this week and see if I can fix the actual issue. |
Hi @jguhlin, Yes; I did see that issue. Luckily, we're explicitly setting |
Hi @jguhlin,
Thanks for all of the recent changes to minimap2-rs. The new builder is much harder to misuse, and with the new index handling, I'm able to get rid of the hacks I had before when using the index from many threads.
However, one issue I've noticed is that while producing identical output to the previous version (when modified to account for the read name for tie breaking), the memory usage is considerably higher, specifically as more threads are used. For example, in oarfish, when mapping about 5M nanopore reads to the human transcriptome, the previous version used just north of 5G of RAM with 32 threads (accounting for all other data structures in the program as well), while with the new minimap2-rs version it uses over 10G of RAM. Given the temporary allocation of multiple ARCs wrapping the read name, I'd expect a small increase but nowhere near this much. This seems much more related to either (1) a much larger person thread buffer (2) a subtle memory leak, as the usage seems to increase slowly but steadily over time or (3) a combination of both of these.
I've tracked the issue down to minimap2-rs, as when I remove the actual calls to map, the memory usage is identical between versions, so the culprit must be something happening in, or triggered by, the map function. I'm happy to help investigate, but wanted to see if you had any immediate ideas.
Note: The big difference seems to occur in the newest version, but is not present in 1.21. It is also present in 1.22. So, whatever is causing the difference happened between 1.21 and 1.22.
Thanks!
Rob
Cc @zzare-umd
The text was updated successfully, but these errors were encountered: