-
Notifications
You must be signed in to change notification settings - Fork 214
Non-adherence to resource parameters #3873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think part of this problem is we pass off some of the code to sklearn, scipy, and numpy. We can't control their resource utilization. So in the past we've had some issues with certain parts of our library that rely on the other libraries switching to much higher utilization. Also note that Which functions are do you use? And do any respect the limits? I know SC2 uses a mix of other libraries, but some of our preprocessing only use numpy which I think doesn't override our limits the way sklearn and scipy can. |
Hi @fruce-ki , I also had some problems with the sorting = si.run_sorter(
recording=my_recording,
sorter_name="spykingcircus2",
matching = {"method": "wobble"}
) I'd be interested to know if this improves the memory use. Let me know :) |
I don't know how the memory_limit works on SC2, but we might have an understable misunderstanding about job_kwargs. The memory parameters in job_kwargs (such as chunking, memory, etc.) refer to the size of the chunks: the unit of processing the pipeline will work with. These are not hard limits or memory caps; not even the post-processing within our library strictly respects them. And as others have mentioned, we don’t control the algorithms used by other libraries and their memory costs. Maybe we should but we don't have the resources at the moment to promise such guarantees. |
Thank you all for the responses. I have no hands-on experience with sklearn/scipy/numpy etc, but I find it surprising that such important and popular python libraries offer no way to control their resources performance. In any case, it is understandable that you don't control everything, but then these parameters are misleading and need to be documented a lot more explicitly. I've spend an absurd amount of time trying to work through/around their behaviour trying to achieve high throughput and I am still unsure how to do it reliably. I know chunk size isn't super-accurately adhered to, and there is the overhead of all the stuff being computed for that chunk, but it is an extremely long way from 10G to 350G. Would it make any difference if chunk size was 1G or 500MB, does SC2 even work with chunks? I know it doesn't use n_jobs from the job_kwargs, so maybe its memory parameters are also separate. With regards to SC2 being experimental, while I am ware it is a work in progress, I was told off here a few months ago for using SC1. So this is giving mixed signals about which tool is the recommended one to use... Should I switch back to SC1? I will give wobble a try. I don't have a function-by-function breakdown of what behaves and what doesn't, as I am aiming to process a high number of recordings, so I am not working interactively. I'd love to break it down more precisely for you if it meant you were going to address it, but you already say that you can't. Pre-processing usually has high resource peaks at the beginning of a recording (maybe while loading the data?) and then comes down significantly. I think cpu comes down all the way, RAM stays higher than I'd have expected, but that's probably because I was expecting the parameter to control the whole processing volume, not only the input data size. |
Let me jump in the discussion. This is true that SC2 is still evolving (we are starting the paper, so I swear it will settle once for good soon), but the reason I was encouraging the switch is that SC1 is not maintained anymore and is likely to be outdated and hard to install with recent versions of libraries. Regarding the memory usage, lots of progress have been made in SC2 recently for high density counts MEA (such as 4096) in order to make the sorter work with reduced memory usage. Now, this should be the case with the main branch of spikeinterface. The memory_usage that i've introduced are internal to SC2 and indeed were made to tackle some of the issues (I can detail more, but this might be a bit technical). So to andwer your questions:
Please upgrade to latest main branch of spikeinterface, and let's solve this properly |
Hi Pierre! My probes are MaxTwo wells, with up to 1020 active electrodes in each recording. |
Yes, such recordings were problematic with former versions of SC2 because when used with lots of cores, the estimate_templates() function called internally during the clustering was preallocating massive amount of RAM. This has been fixed, you should give it a try. Either by reducing the amount of cores used ONLY during this estimation (and this is what memory_usage was for in #3721) or even better, by skipping this estimation from raw data and instead infer templates from SVD values (already in ram). This should now be the default in main with the option templates_from_svd |
Sounds promising! |
I also just want to make clear that this isn't an easy problem to solve see here about kernel scheduling in linux. We can tell python to request something, but the kernel of your OS makes the ultimate decision. For Windows that often means bouncing a process between CPUs even if that doesn't make sense. And we could tell python to request xx GB of RAM, but maybe the OS kernel gives some degree of extra memory. What we can try to fix are memory leaks with our programs and try to make the best possible requests but as OSs change we often see even the biggest libraries (like scipy or sklearn) have to deal with spiking CPU and RAM usage. I think at best we could add a note in our documentation to make it clearer that we try to add limits but that the OS/python often does override the limitations we put in. |
I switched to wobble, and updated my script to the new parameters for the current main branch. The memory footprint is looking much better! The CPU spikes still exist during the following steps: No major cpu peaks in basic postprocessing. In pre-processing, after a very short-lived initial peak, presumably during data loading, it stabilized to where it should be. No memory peaks during pre or post. |
Great to hear it's working better @fruce-ki ! Just to clarify that I'm not against SC2 in any way - actually, I use it and think the results are great! I more meant that because it's evolving, we users might need to spend a bit more time poking and adjusting it for our set-up compared to something more mature and stable. Also, to give an idea of how complex memory/multi-processing is in python: have a look at the sklearn page about parallelism https://scikit-learn.org/stable/computing/parallelism.html it's very complex! |
It seems that the parameters for multithreading and RAM are not obeyed throughout.
The screenshot of my resource monitor is from a sorting run with
global_job_kwargs = dict(n_jobs = 1, mp_context = 'spawn', chunk_memory = '10G', total_memory = None)
as overall settings, and'n_jobs' : 1
and'memory_limit': 0.5
in the SC2 settings.In a workstation with 20 physical Intel cores and >300G RAM, this run should have barely been visible on the resource monitor, if the resource restrictions were respected. But apparently some stages of the execution ask for and receive 100% of the resources. The shot is during SC2, but I get similar situations with the preprocessing and postprocessing modules.
I've tried
'memory_limit': 0.1
with not much difference: Right now I'm sitting at 80% RAM duringfind spikes (circus-omp-svd) (no parallelization):
, when the allowed RAM should be just 10% if I understand correctly.This is quite a problem for me. On one hand, this often oversubscribes resources and I come back to a killed terminal session, making it impossible to reliably queue processing of many recordings overnight or over the over the weekend. On the other hand it also prevents me from running any other resource intensive tasks while processing a recording, as I know that those resources will at some point become unavailable and cause stuff to crash.
Am I doing something wrong with my parameters? Is there a way to keep SI from taking up all the resources and instead take up only a predictable amount that would allow me to run other stuff in parallel with this?
I'm on version 0.102.2
The text was updated successfully, but these errors were encountered: