Skip to content

Conversation

@TomNicholas
Copy link
Contributor

We should re-run the example to double-check everything still works as expected.

@TomNicholas TomNicholas added documentation 📝 Improvements or additions to documentation virtual references 👻 Involves virtual kerchunk/virtualizarr chunk references labels Jul 21, 2025
We are going to create a virtual dataset pointing to all of the [OISST](https://www.ncei.noaa.gov/products/optimum-interpolation-sst) data for August 2024. This data is distributed publicly as netCDF files on AWS S3, with one netCDF file containing the Sea Surface Temperature (SST) data for each day of the month. We are going to use `VirtualiZarr` to combine all of these files into a single virtual dataset spanning the entire month, then write that dataset to Icechunk for use in analysis.

Before we get started, we need to install `virtualizarr`, and `icechunk`. We also need to install `fsspec` and `s3fs` for working with data on s3.
Before we get started, we need to install `virtualizarr` (this notebook uses VirtualiZarr v2.0.0), and `icechunk`. We also need to install `fsspec`, `s3fs`, and `obstore` for working with data on s3.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason we are using fsspec and s3fs here is literally just to glob the files in the bucket. If we had a globbing function in obspec then we wouldn't need fsspec at all.

bucket = "noaa-cdr-sea-surface-temp-optimum-interpolation-pds/"
store = S3Store(
bucket=bucket,
region="us-west-2",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right region - @mpiannucci presumably you know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is, you can double check on the AWS registry tho

Copy link
Contributor Author

@TomNicholas TomNicholas Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually us-east-1 https://registry.opendata.aws/noaa-cdr-oceanic/

@ianhi
Copy link
Collaborator

ianhi commented Jul 22, 2025

Can these docs be autoexecuted? Seems as though they could

@TomNicholas
Copy link
Contributor Author

Can these docs be autoexecuted? Seems as though they could

Yes should be. But do we have any examples of making docs executable in this repo yet? Looks like this #754 is still open?

@ianhi
Copy link
Collaborator

ianhi commented Jul 23, 2025

Can these docs be autoexecuted? Seems as though they could

Yes should be. But do we have any examples of making docs executable in this repo yet? Looks like this #754 is still open?

yes, most of them are executed now. For example https://github.com/earth-mover/icechunk/blob/main/docs/docs/xarray.md

essentially it's a bunch of these:

````python exec="on" session="xarray" source="material-block"`


We also need to give the parser a way to access our files. We do this by creating an `ObjectStoreRegistry` containing an obstore `S3Store` for that bucket.

```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```python
```python exec="on" session="virtual" source="material-block"

like this, and adding the same on other code blcoks. They share variables and state so long as they have the same "session"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation 📝 Improvements or additions to documentation virtual references 👻 Involves virtual kerchunk/virtualizarr chunk references

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants