-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic parallelization for dask arrays in apply_ufunc #1517
Conversation
Wow, this is great stuff! What's When this makes it into the public facing API it would be nice to include some guidance on how the chunking scheme affects the run time. Imagine a plot with run time plotted as a function of chunk size or number of chunks. Of course it also depends on the data size and the number of cores available. To say it in a different way, More ambitiously I could imagine an API such as |
Oops, fixed.
We already have some tips here:
Yes, this would be great.
I agree with both! |
I'm curious, how long does this line take:
Have you consider setting |
@mrocklin Yes, that took a few seconds (due to hashing the array contents). Would you suggest setting |
Yes if you don't care strongly about deduplication. The following will be slower:
In current operation this will be optimized to
So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. |
This looks great! I am not sure if this is the right place to bring this up, but is there any way to add ghost cell functionality to |
@nbren12 Probably the best way to do ghosting with the current interface is to write a function that acts on dask array objects to apply the ghosting, and then apply it using |
Ok thanks. just so I understand you correctly, are you recommending something like this:
Wouldn't xarray complain because the ghosted axes data would have different size than the corresponding coordinates? |
@nbren12 for similar use cases I've had success writing a single function that does the ghosting, applies a function with def centered_diff_numpy(arr, axis=-1, spacing=1.):
return (np.roll(arr, -1, axis=axis) - np.roll(arr, 1, axis=axis)) / (2. * spacing)
def centered_diff(da, dim, spacing=1.):
def apply_centered_diff(arr, spacing=1.):
if isinstance(arr, np.ndarray):
return centered_diff_numpy(arr, spacing=spacing)
else:
axis = len(arr.shape) - 1
g = darray.ghost.ghost(arr, depth={axis: 1}, boundary={axis: 'periodic'})
result = darray.map_blocks(centered_diff_numpy, g, spacing=spacing)
return darray.ghost.trim_internal(result, {axis: 1})
return computation.apply_ufunc(
apply_centered_diff, da, input_core_dims=[[dim]],
output_core_dims=[[dim]], dask_array='allowed', kwargs={'spacing': spacing}) Depending on your use case, you might also consider def apply_centered_diff(arr, spacing=1.):
if isinstance(arr, np.ndarray):
return centered_diff_numpy(arr, spacing=spacing)
else:
axis = len(arr.shape) - 1
return darray.ghost.map_overlap(
arr, centered_diff_numpy, depth={axis: 1}, boundary={axis: 'periodic'},
spacing=spacing) (Not sure if this is what @shoyer had in mind, but just offering an example) |
Hey Spencer! Thanks. That makes much more sense. I have written nearly identical code for centered differencing, but did not know about |
I guess the key issue here is that some computations (E.g. finite differences) cannot be boiled down to passing one numpy function to @shoyer Would it unreasonably complicated to add some sort of |
d9c9346
to
e95d311
Compare
This seems like a reasonable option to me. Once we get this merged, want to make a PR? @jhamman could you give this a review? I have not included extensive documentation yet, but I am also reluctant to squeeze that into this PR before we make it public API. (Which I'd like to save for another one.) |
Sure. I'd be happy to make a PR once this gets merged.
…On Sat, Sep 16, 2017 at 10:39 PM Stephan Hoyer ***@***.***> wrote:
Alternatively apply_ufunc could see if the func object has a pre_dask_atop
method, and apply it if it does.
This seems like a reasonable option to me. Once we get this merged, want
to make a PR?
@jhamman <https://github.com/jhamman> could you give this a review? I
have not included extensive documentation yet, but I am also reluctant to
squeeze that into this PR before we make it public API. (Which I'd like to
save for another one.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1517 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUoksuo9P3AJIzemncQQJZ3D5Ga2Opsks5sjLCdgaJpZM4PAViG>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - this is looking quite nice/clean. I had one comment that really just comes down to making sure the inputs are well defined and that the errors are clear to the user.
I haven't used dask's atop function so maybe @spencerkclark or @nbren12 could review that section to add one more set of eyes to this before we merge.
xarray/core/computation.py
Outdated
if output_dtypes is None: | ||
raise ValueError('output dtypes (output_dtypes) must be supplied to ' | ||
"apply_func when using dask='parallelized'") | ||
if len(output_dtypes) != signature.num_outputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure output_dtypes
is an iterable before calling len
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, i took a look and the atop call looks pretty similar to what I am used to. I was having a little trouble parsing the dosctring on apply ufunc though. I might submit a separate issue about that.
xarray/tests/test_computation.py
Outdated
array1 = da.from_array(rs.randn(4, 4), chunks=(2, 2)) | ||
array2 = da.from_array(rs.randn(4, 4), chunks=(2, 2)) | ||
data_array_1 = xr.DataArray(array1, dims=('x', 'y')) | ||
data_array_2 = xr.DataArray(array2, dims=('x', 'y')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile to add a test which tries apply_ufunc
on DataArrays which only share one dimension.
I was not aware of dask's atop function before reading this PR (it looks pretty cool), so I defer to @nbren12 there. |
I have a design question here: how should we handle cases where a core dimension exists in multiple chunks? For example, suppose you are applying a function that needs access to every point along the "time" axis at once (e.g., an auto-correlation function). Should we:
Currently we do behavior 1, but behavior 2 might be more user friendly. Otherwise it could be pretty easy to inadvertently pass in a dask array (e.g., in small chunks along dask.array has some heuristics to protect against this in |
The heuristics we have are I think just of the form "did you make way more chunks than you had previously". I can imagine other heuristics of the form "some of your new chunks are several times larger than your previous chunks". In general these heuristics might be useful in several places. It might make sense to build them in a |
@shoyer - anything left to do here? |
I think this is ready. |
Great. Go ahead and merge it then. I'm very excited about this feature. |
I'll start on my PR to expose this as public API -- hopefully will make some progress on my flight from NY to SF tonight. |
This lets you parallelize a function designed for numpy inputs just by adding a few keyword arguments to
apply_ufunc
.Example usage, to calculate rank correlation between two variables (this will probably turn into an example for the docs):
We get a nice 5x speedup for this compute bound task:
Still needs examples in the documentation.
The next step is finally expose
apply_ufunc
as public API :).git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIcc @mrocklin