Skip to content

Demo dataset with UFS-Replay #378

@jhamman

Description

@jhamman

NOAA's UFS Replay could be an interesting public dataset to demo Icechunk with. Its big, >1PB!

It is available in two formats, both of which could be interesting to explore via virtual datasets:

  • A Zarr v2 dataset on Google Cloud Storage (gs://noaa-ufs-gefsv13replay/ufs-hr1)
  • A collection of NetCDF files on AWS S3 (s3://noaa-ufs-gefsv13replay-pds/)

My thinking is that this dataset could be a good stress test for PB scale Icechunk datasets and virtual datasets at scale.

cc @TomNicholas and @timothyas


Known blockers:

Metadata

Metadata

Assignees

No one assigned

    Labels

    use case 🌎Real-world use casevirtual references 👻Involves virtual kerchunk/virtualizarr chunk references

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions