-
Notifications
You must be signed in to change notification settings - Fork 27
Xarray GPU optimization #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@negin513 is attempting to deploy a commit to the xarray Team on Vercel. A member of the Team first needs to authorize it. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up!
Co-authored-by: Tom Augspurger <[email protected]>
- name: Katelyn Fitzgerald | ||
github: kafitzgerald | ||
|
||
summary: 'How to accelerate AI/ML workflows in Earth Sciences with GPU-native Xarray and Zarr.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more direct? "X% speedup" or "XMBps throughput"?
src/posts/gpu-pipeline/index.md
Outdated
(TODO ongoing work) Eventually with this [cupy-xarray Pull Request merged](https://github.com/xarray-contrib/cupy-xarray/pull/70) (based on earlier work at https://xarray.dev/blog/xarray-kvikio), this can be simplified to: | ||
|
||
```python | ||
import cupy_xarray | ||
|
||
ds = xr.open_dataset(filename_or_obj="/tmp/air-temp.zarr", engine="kvikio") | ||
assert isinstance(ds.air.data, cp.ndarray) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could go in a future work section at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure if this API is feasible or even desirable (have tried to implement this in xarray-contrib/cupy-xarray#70, but no luck yet patching the buffer protocol). So ok to move this towards the end.
|
||
 | ||
|
||
(TODO insert better nsight profiling figure than above showing overlapping CPU and GPU compute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be really nice!
src/posts/gpu-pipeline/index.md
Outdated
- Consider using GPU Direct Storage (GDS) for optimal performance, but be aware of the setup and configuration required. | ||
- GPU Direct Storage (GDS) can be an improvement for data-intensive workflows, but requires some setup and configuration. | ||
- NVIDIA DALI is a powerful tool for optimizing data loading, but requires some effort to integrate into existing workflows. | ||
- GPU-based decompression is a promising area for future work, but requires further development and testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Icechunk!
@@ -0,0 +1,223 @@ | |||
--- | |||
title: 'Accelerating AI/ML Workflows in Earth Sciences with GPU-Native Xarray and Zarr (and more!)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: 'Accelerating AI/ML Workflows in Earth Sciences with GPU-Native Xarray and Zarr (and more!)' | |
title: 'GPU-Native Earth Science AI/ML Workflows Xarray, Zarr, DALI, and nvcomp' |
better SEO this way?
src/posts/gpu-pipeline/index.md
Outdated
(TODO ongoing work) Eventually with this [cupy-xarray Pull Request merged](https://github.com/xarray-contrib/cupy-xarray/pull/70) (based on earlier work at https://xarray.dev/blog/xarray-kvikio), this can be simplified to: | ||
|
||
```python | ||
import cupy_xarray | ||
|
||
ds = xr.open_dataset(filename_or_obj="/tmp/air-temp.zarr", engine="kvikio") | ||
assert isinstance(ds.air.data, cp.ndarray) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure if this API is feasible or even desirable (have tried to implement this in xarray-contrib/cupy-xarray#70, but no luck yet patching the buffer protocol). So ok to move this towards the end.
src/posts/gpu-pipeline/index.md
Outdated
- GPU Direct Storage (GDS) for optimal performance | ||
- NVIDIA DALI | ||
- Work out how to use GDS when reading from cloud object store instead of on-prem disk. | ||
- etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to shout out that reading/writing Zarr shards with GPU buffers (thanks @maxrjones and @TomAugspurger!) at zarr-developers/zarr-python#2978 was just merged, and could go in here or somewhere above, depending on when this blog post gets published.
for more information, see https://pre-commit.ci
Co-authored-by: Deepak Cherian <[email protected]>
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Deepak Cherian <[email protected]>
Contributors: @negin513, @weiji14 , @TomAugspurger , @maxrjoes, @akshaysubr, @kafitzgerald