-
Notifications
You must be signed in to change notification settings - Fork 50
Update docs on virtual references to match VirtualiZarr v2.0's updated API #1099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update docs on virtual references to match VirtualiZarr v2.0's updated API #1099
Conversation
| We are going to create a virtual dataset pointing to all of the [OISST](https://www.ncei.noaa.gov/products/optimum-interpolation-sst) data for August 2024. This data is distributed publicly as netCDF files on AWS S3, with one netCDF file containing the Sea Surface Temperature (SST) data for each day of the month. We are going to use `VirtualiZarr` to combine all of these files into a single virtual dataset spanning the entire month, then write that dataset to Icechunk for use in analysis. | ||
|
|
||
| Before we get started, we need to install `virtualizarr`, and `icechunk`. We also need to install `fsspec` and `s3fs` for working with data on s3. | ||
| Before we get started, we need to install `virtualizarr` (this notebook uses VirtualiZarr v2.0.0), and `icechunk`. We also need to install `fsspec`, `s3fs`, and `obstore` for working with data on s3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason we are using fsspec and s3fs here is literally just to glob the files in the bucket. If we had a globbing function in obspec then we wouldn't need fsspec at all.
docs/docs/virtual.md
Outdated
| bucket = "noaa-cdr-sea-surface-temp-optimum-interpolation-pds/" | ||
| store = S3Store( | ||
| bucket=bucket, | ||
| region="us-west-2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right region - @mpiannucci presumably you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is, you can double check on the AWS registry tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's actually us-east-1 https://registry.opendata.aws/noaa-cdr-oceanic/
|
Can these docs be autoexecuted? Seems as though they could |
Yes should be. But do we have any examples of making docs executable in this repo yet? Looks like this #754 is still open? |
yes, most of them are executed now. For example https://github.com/earth-mover/icechunk/blob/main/docs/docs/xarray.md essentially it's a bunch of these: ````python exec="on" session="xarray" source="material-block"` |
|
|
||
| We also need to give the parser a way to access our files. We do this by creating an `ObjectStoreRegistry` containing an obstore `S3Store` for that bucket. | ||
|
|
||
| ```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ```python | |
| ```python exec="on" session="virtual" source="material-block" |
like this, and adding the same on other code blcoks. They share variables and state so long as they have the same "session"
We should re-run the example to double-check everything still works as expected.