-
Notifications
You must be signed in to change notification settings - Fork 50
Closed
Labels
virtual references 👻Involves virtual kerchunk/virtualizarr chunk referencesInvolves virtual kerchunk/virtualizarr chunk references
Description
In order to create and use virtual datasets with python, users will want to use kerchunk and virtualizarr. These are just starting down the path to zarr 3 and icechunk compatability. This issue will be used to track progress and relevant PRs:
- Support writing to icechunk from Virtualizarr: Add Icechunk Support zarr-developers/VirtualiZarr#256 Writing virtual references into Icechunk from VirtualiZarr #103
- Support zarr 3 codecs in Virtualizarr: Fix v3 codec pipeline VirtualiZarr#4
- Zarr 3 support for kerchunk:
zarr-pythonv3 compatibility fsspec/kerchunk#516 - Numcodecs zarr 3 wrapper: Add wrappers for zarr v3 zarr-developers/numcodecs#524 + Sync with zarr 3 beta zarr-developers/numcodecs#597
- Xarray zarr 3 support: Compatibility for zarr-python 3.x pydata/xarray#9552
All of this can be installed with pip. However we need to install with three steps for now to avoid version conflicts:
pip install icechunk xarray VirtualiZarr kerchunkThis assumes also having fsspec and s3fs and h5 installed:
pip install fsspec s3fs h5py h5netcdfWith all of this installed, HDF5 virtual datasets currently work like this:
from datetime import datetime, timezone
import icechunk
import xarray as xr
import virtualizarr
url = 's3://met-office-atmospheric-model-data/global-deterministic-10km/20250204T0000Z/20250204T0000Z-PT0000H00M-pressure_at_mean_sea_level.nc'
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
# create virtualizarr dataset
vds = virtualizarr.open_virtual_dataset(url, reader_options={'storage_options': so}, indexes={})
# create an icechunk repo that can read virtual chunks from eu-west-region with anonymous access
storage = icechunk.local_filesystem_storage("./ukmet")
config = icechunk.RepositoryConfig.default()
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3", "s3://", icechunk.s3_store(region="eu-west-2")))
credentials = icechunk.containers_credentials(s3=icechunk.s3_credentials(anonymous=True))
repo = icechunk.Repository.create(storage, config, credentials)
# create a session, and write to a group inside it using virtualizarr
session = repo.writable_session("main")
vds.virtualize.to_icechunk(session.store, group="msl", last_updated_at=datetime.now(timezone.utc))
# commit to save progress
session.commit("Add msl pressure")
# open it back up
ds = xr.open_zarr(session.store, group="msl", zarr_format=3, consolidated=False, decode_times=False)
ds
# plot!
ds.air_pressure_at_sea_level.plot()Updated 2/4/2025
TomNicholas, norlandrhagen, maxrjones and srstsavage
Metadata
Metadata
Assignees
Labels
virtual references 👻Involves virtual kerchunk/virtualizarr chunk referencesInvolves virtual kerchunk/virtualizarr chunk references
