|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: 2025-03-13 CF Governance Panel meeting |
| 4 | +--- |
| 5 | +# 2025-03-13 CF Governance Panel meeting |
| 6 | + |
| 7 | +## Attendees |
| 8 | +Attending: Jonathan, Daniel, Bryan, Ethan, Karl |
| 9 | + |
| 10 | +## Agenda/Notes |
| 11 | + |
| 12 | +* Schedule our next meeting |
| 13 | + * 12 June 2025 at 14:00 UTC (8am MDT, 3pm BST, 4pm CEST) |
| 14 | +* Update on writing an article on CF history/future |
| 15 | + * Prioritize between Roadmap and History |
| 16 | + * Single paper with recent history and future plans? |
| 17 | + * Introduction |
| 18 | + * |
| 19 | + * History |
| 20 | + * Start \- CMIP3 |
| 21 | + * CMIP5-7 (now) |
| 22 | + * Process |
| 23 | + * Principles, advantages to a process that requires time for careful thought and consideration with the aim of reaching consensus |
| 24 | + * Examples (e.g enumerations and swathes) |
| 25 | + * Next steps: The roadmap |
| 26 | + * History helps explain why things are the way they are |
| 27 | +* Could we convene a meeting with Zarr/GeoZarr developers to discuss data model and interoperability? |
| 28 | + * To some extent their data model is not different from the (basic) NetCDF model is it? |
| 29 | + Clearly the format is different … |
| 30 | + * (We now have a *working* a pure python HDF5 reader \- the pull request is public now \- see [https://github.com/jjhelmus/pyfive](https://github.com/jjhelmus/pyfive) \- that does nearly (**+**) everything that Zarr can do\! |
| 31 | + So all that stuff can work on top of NetCDF without needing Zarr per se.) |
| 32 | + |
| 33 | + (**+**) need support for arbitrary filters. |
| 34 | + * (We do need to do better at good chunking default) |
| 35 | + * Zarr exists (I think) to handle three things: |
| 36 | + * Threading performance (the HDF5 c-library is not thread safe) |
| 37 | + * People always rechunk into zarr files, the important requirement is the rechunking not the zarr |
| 38 | + * The metadata is consolidated into one file, so it can be read more efficiently than reading metadata throughout a file \- but this too can be done by h5repacking as is done by reading netcdf and writing zarr. |
| 39 | + |
| 40 | + The biggest downside of Zarr is the number of files. |
| 41 | + If your average netCDF file is 2 GB, then you will be facing \~500 times more Zarr files (every 4 MB chunk is a zarr file). |
| 42 | + This is a problem for HPC file systems and data managers. |
0 commit comments