-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree-reduce the combine for open_mfdataset(..., parallel=True, combine="nested")
#8523
Comments
Oh this is an interesting idea... How much faster is this? What does the graph look like? (The notebook in the gist doesn't seem to show either)
What is this proposal doing instead? Don't the coordinates still ultimately get shipped to be on the same node in order to do the alignment? |
Haven't tested, happy to say I don't use
No it'll execute the combine 8 datasets at a time, then combine the results of that step 8 datasets at a time, and so on remotely and ship the final combined dataset back to the head node. |
I used it the first time today in a while 😅 Mostly because of fsspec/kerchunk#386
I'm definitely missing something, but like won't the same amount of data still need to get moved around in the end? This is potentially faster just because the communication doesn't all clobber the lone head node at once? |
In the coiled pattern where you orchestrate remote workers but download results to the user's machine machine, this is a lot of copies moving to the user's machine. I agree that this is less of a concern in remote JupyterHub deployments, or in HPC environments; but I bet you'll still see an improvement when opening O(10,000) files. |
@phofl pointed out today that we can create one more This would be a relatively easy improvement even though it doesn't get rid of the core problem that a single machine receives everything at once. |
Is your feature request related to a problem?
When
parallel=True
and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the "head node" for the combine.Instead we can tree-reduce the combine (example) by switching to
dask.bag
instead ofdask.delayed
and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.combine="nested"
cc @TomNicholas
The text was updated successfully, but these errors were encountered: