-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add CLI for converting v2 metadata to v3 #3257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sting a zarr version greater than 3
I'd suggest putting the metadata converter python API in a new |
I've made most of the requested changes to the implementation now. Any remaining I've responded to in the review thread above - @dstansby let me know if you have any suggestions for these, or for the refactored implementation / tests. While changing the migration functions to accept Also, I noticed that I don't have any handling for consolidated metadata ( |
Nice! Unfortunately, in pursuit of fixing #3295 I also did a refactor at #3308 - I think we should probably merge my PR first (sorry!) and then rebase this one later, since this is a feature and my refactor is a pathway to fixing a bug.
I think it's fine to gracefully error on consolidated metadata for now, and add support as a follow up feature in a future PR. |
All conflicts are now fixed. I also added a line to stop conversion if consolidated metadata is detected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I think outstanding todos here are:
- Add a release note entry
- Add a new section to the user guide docs to advertise the new CLI
- Work out how to do logging (see inline thread discussion)
as a general comment about converting v2 -> v3, converting the v2 |
Thanks @d-v-b - at the moment this PR:
Happy to wait on merging this PR until the other issues / PRs you mentioned are resolved. |
@K-Meech that approach seems good, and I don't this effort should be blocked by by codec changes in the background. |
@dstansby I think I've addressed all of your new comments now + I added release notes and a user guide docs page. There's still a comment about filters from a while ago - any thoughts on that one? Also, let me know if you have any comments on the new changes - I had to make some small modifications to handle conflicts with the latest changes to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I think this is good now - I have one question about use of the logger, but it's not a blocker. I'll let this sit for a week or so because it's complicated, and would benefit from a second reviewer. If no-one reviews by then, I'll merge.
|
||
app = typer.Typer() | ||
|
||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deliberate that this is a new logger, instead of importing the logger object from zarr
? I don't tihnk it matters too much, but re-using zarr._logger
might save some code duplication because you could remove functions from this file for configuring the logger.
For #1798
Adds a CLI using
typer
to convert v2 metadata (.zarray
/.zattrs
...) to v3 metadatazarr.json
.To test, you will need to install the new optional cli dependency e.g.
pip install -e ".[remote,cli]"
This should make the
zarr-converter
command available e.g. try:convert
addszarr.json
files to every group / array, leaving the v2 metadata as-is. A zarr with both sets of metadata can still be opened withzarr.open
, but will give a UserWarning:Both zarr.json (Zarr format 3) and .zarray (Zarr format 2) metadata objects exist... Zarr v3 will be used.
. This can be avoided by passingzarr_format=3
tozarr.open
, or by using theclear
command to remove the v2 metadata.clear
can also remove v3 metadata. This is useful if the conversion fails part way through e.g. if one of the arrays uses a codec with no v3 equivalent.All code for the cli is in
src/zarr/core/metadata/converter/cli.py
, with the actual conversion functions insrc/zarr/core/metadata/converter/converter_v2_v3.py
. These functions can be called directly, for those who don't want to use the CLI (although currently they are part of/core
which is considered private API, so it may be best to move them elsewhere in the package).Some points to consider:
set_path
fromtest_dtype_registry.py
andtest_codec_entrypoints.py
, as they were causing the CLI tests to fail if they were run after. This seems to be due to thelazy_load_list
of the numcodecs codecs registries being cleared, meaning they were no longer available in my code which finds thenumcodecs.zarr3
equivalent of a numcodecs codec.TODO:
docs/user-guide/*.rst
changes/