Adjust MrVI running hyperparameters for 1000s of samples #3145

PierreBoyeau · 2025-01-14T10:27:40Z

The default execution of MrVI core functions relies on vmap, which fastens execution and increases memory usage.
This memory cost is not sustainable in scenarios with 1000s of samples.
A first step in this direction would be to disable vmap by default. @justjhong @canergen what do you think?

justjhong · 2025-01-14T16:05:19Z

I think a majority of users will use defaults without paying attention to arguments like use_vmap. How about we do an automatic change to not use vmap when the number of samples exceeds 1000? Then, we can also display a warning message that we did so if the user didn't explicitly pass in use_vmap=True (e.g., "vmap parallelized execution has been disabled automatically since the number of samples exceeds 1000. If you would still like to use vmap, explicitly pass in use_vmap=True"). To get this behavior we have the default as use_vmap=None and treat this case differently.

canergen · 2025-01-14T16:13:58Z

Best case you do a for loop over the array in the vmap dimension of 100 each (not sure there is a pre-specified subset for this thing). Just assume that you also have sped-up for large sample sizes but the speed up likely ceils for some size over vmap.

PierreBoyeau · 2025-01-14T18:17:57Z

Thanks for the feedback. I added two things:

more informative tracebacks to let users know use_vmap=False could fix OOM errors.
a change in the default to use_vmap='auto' to automatically determine whether vmap makes sense.

@canergen I have tried these batched vmaps in the VIVS code. One problem is that this significantly affects the code readability. I prefer to avoid implementing these strategies, given how packed _model.py is. Let me know what you think!

canergen · 2025-01-14T20:05:28Z

Sounds reasonable. How long does it take now for 1000 samples and 10k cells?

VladimirShitov · 2025-02-10T14:09:58Z

Hey! I hope you don't mind me intervening in this discussion. I am running MrVI on large-scale datasets (700 k - 1.5 Mln cells, hundreds of samples). The method seems to be working nicely, but the scaling makes me very sad. Running get_local_sample_distances() often fails even on a powerful compute node (OOM), and the estimated running time is hours to tens of hours.

Do you have any recommendations on how to run the method efficiently?

Also related to #3166

justjhong · 2025-02-10T17:29:36Z

Hi @VladimirShitov, thanks for your comment. We think JAX updates have caused both issues (scaling wrt memory and time). I made an issue for the time scaling issue here #3179. We will have to figure out how to debug the issue w/ JAX's updates or as a temporary fix pin to an older version of JAX.

VladimirShitov · 2025-02-10T19:26:23Z

Thanks @justjhong ! Looking forward to the updates :) Also, lmk if I can help with testing scalability with the datasets I have at hand

PierreBoyeau added the enhancement label Jan 14, 2025

PierreBoyeau self-assigned this Jan 14, 2025

PierreBoyeau linked a pull request Jan 14, 2025 that will close this issue

vmap False by default #3146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust MrVI running hyperparameters for 1000s of samples #3145

Adjust MrVI running hyperparameters for 1000s of samples #3145

PierreBoyeau commented Jan 14, 2025

justjhong commented Jan 14, 2025

canergen commented Jan 14, 2025 •

edited

Loading

PierreBoyeau commented Jan 14, 2025

canergen commented Jan 14, 2025

VladimirShitov commented Feb 10, 2025

justjhong commented Feb 10, 2025

VladimirShitov commented Feb 10, 2025

Adjust MrVI running hyperparameters for 1000s of samples #3145

Adjust MrVI running hyperparameters for 1000s of samples #3145

Comments

PierreBoyeau commented Jan 14, 2025

justjhong commented Jan 14, 2025

canergen commented Jan 14, 2025 • edited Loading

PierreBoyeau commented Jan 14, 2025

canergen commented Jan 14, 2025

VladimirShitov commented Feb 10, 2025

justjhong commented Feb 10, 2025

VladimirShitov commented Feb 10, 2025

canergen commented Jan 14, 2025 •

edited

Loading