-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add CLI for converting v2 metadata to v3 #3257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
K-Meech
wants to merge
66
commits into
zarr-developers:main
Choose a base branch
from
K-Meech:km/v2-v3-conversion
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,595
−89
Open
Changes from 58 commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
45bb4e5
add rough cli converter structure
K-Meech 456c9e7
allow zstd, gzip and numcodecs zarr 3 compression
K-Meech 242a338
convert filters to v3
K-Meech 1045c33
create BytesCodec with correct endian
K-Meech 4e2442f
handle C vs F order in v2 metadata
K-Meech c63f0b8
save group and array metadata to file
K-Meech 2947ce4
create overall conversion functions for store, array or group
K-Meech ba81755
add minimal typer cli
K-Meech 67f9580
add initial tests for converter
K-Meech 0d7c2c8
add tests for conversion of groups and nested groups and arrays
K-Meech cf39580
add tests for conversion of compressors and filters
K-Meech 11499e7
test conversion of order and endianness
K-Meech 90b0996
add tests for edge cases of incorrect codecs
K-Meech 85159bb
add tests for / separator
K-Meech 53ba166
draft of metadata remover and add test for internal paths
K-Meech d4cdc04
add clear command to cli with tests
K-Meech dfdc729
add test for metadata removal with path#
K-Meech ad60991
add verbose logging option
K-Meech 66bae0d
add dry run option to cli
K-Meech 97df9bf
add test for dry-run
K-Meech 42e0435
add zarr-converter script and enable cli dep in tests
K-Meech 9e20b39
use v2 chunk key encoding type
K-Meech 6586e66
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech ce409a3
update endianness of test data type
K-Meech fb7136b
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech 6585f24
check converted arrays can be accessed
K-Meech 46e958d
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech 08fc138
remove uses of pathlib walk, as it didn't exist in python 3.11
K-Meech 3540434
include tags in checkout for gpu test, to avoid numcodecs.zarr3 reque…
K-Meech 0889979
rename cli commands from review comments
K-Meech d906dba
remove path option
K-Meech 5e03e3c
allow metadata to be written to a separate store location
K-Meech 89aa095
add overwrite and remove-v2-metadata options
K-Meech ade9c3b
add force option
K-Meech 218e8a8
use v2, v3 format for CLI
K-Meech 49787f6
split into convert_group and convert_array functions
K-Meech 488485c
update command names in converter tests
K-Meech 18487c9
update test filename to reflect command name change
K-Meech a5cd760
fix tests for sub-groups
K-Meech bde452f
add tests for --force
K-Meech 671c5e3
add test for migrating to separate output location
K-Meech 0281cc1
add test for remove-v2-metadata option
K-Meech 2ffe854
update test names to match command name
K-Meech 432eae6
add test for --remove-v2-metadata with separate output location
K-Meech 7cb42c5
merge upstream changes
K-Meech 6e6788d
separate cli fixtures from the tests
K-Meech 4abc84a
add test for overwrite option in separate location
K-Meech 0bdd6f8
fix failing test
K-Meech f2fa389
small fixes to tests
K-Meech 4d98121
Merge pull request #1 from K-Meech/km/v2-v2-conversion-review
K-Meech 649bb20
fix pre-commit errors
K-Meech dba4073
update docstrings with review comments
K-Meech b702060
pass filters and compressors to processing functions, rather than ful…
K-Meech b900a0e
use Store as input rather than StoreLike
K-Meech 42aa7db
move conversion functions into public api
K-Meech d3fc21e
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech 5c05c0c
merge upstream changes
K-Meech f62fe31
fail on discovery of consolidated metadata
K-Meech 71067ba
minor changes from review
K-Meech 34e97f0
use same logger throughout zarr-python
K-Meech 9f6b875
add release notes and docs for the cli
K-Meech 1362cc6
tidy up formatting of zarr.metadata api docs
K-Meech 4ae3491
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech f301172
fix failing tests
K-Meech 0449ef7
add a section about --verbose to the docs
K-Meech 14b9cfd
Merge branch 'main' into km/v2-v3-conversion
d-v-b File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,8 @@ jobs: | |
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 # grab all branches and tags | ||
# - name: cuda-toolkit | ||
# uses: Jimver/[email protected] | ||
# id: cuda-toolkit | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
import logging | ||
from enum import Enum | ||
from typing import Annotated, Literal, cast | ||
|
||
import typer | ||
|
||
import zarr.metadata.migrate_v3 as migrate_metadata | ||
from zarr.core.sync import sync | ||
from zarr.storage._common import make_store | ||
|
||
app = typer.Typer() | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
def _set_logging_config(*, verbose: bool) -> None: | ||
if verbose: | ||
lvl = logging.INFO | ||
else: | ||
lvl = logging.WARNING | ||
fmt = "%(message)s" | ||
logging.basicConfig(level=lvl, format=fmt) | ||
|
||
|
||
def _set_verbose_level() -> None: | ||
logging.getLogger().setLevel(logging.INFO) | ||
|
||
|
||
class ZarrFormat(str, Enum): | ||
v2 = "v2" | ||
v3 = "v3" | ||
|
||
|
||
class ZarrFormatV3(str, Enum): | ||
"""Limit CLI choice to only v3""" | ||
|
||
v3 = "v3" | ||
|
||
|
||
@app.command() # type: ignore[misc] | ||
def migrate( | ||
zarr_format: Annotated[ | ||
ZarrFormatV3, | ||
typer.Argument( | ||
help="Zarr format to migrate to. Currently only 'v3' is supported.", | ||
), | ||
], | ||
input_store: Annotated[ | ||
str, | ||
typer.Argument( | ||
help=( | ||
"Input Zarr to migrate - should be a store, path to directory in file system or name of zip file " | ||
"e.g. 'data/example-1.zarr', 's3://example-bucket/example'..." | ||
) | ||
), | ||
], | ||
output_store: Annotated[ | ||
str | None, | ||
typer.Argument( | ||
help=( | ||
"Output location to write generated metadata (no array data will be copied). If not provided, " | ||
"metadata will be written to input_store. Should be a store, path to directory in file system " | ||
"or name of zip file e.g. 'data/example-1.zarr', 's3://example-bucket/example'..." | ||
) | ||
), | ||
] = None, | ||
dry_run: Annotated[ | ||
bool, | ||
typer.Option( | ||
help="Enable a dry-run: files that would be converted are logged, but no new files are created or changed." | ||
), | ||
] = False, | ||
overwrite: Annotated[ | ||
bool, | ||
typer.Option( | ||
help="Remove any existing v3 metadata at the output location, before migration starts." | ||
), | ||
] = False, | ||
force: Annotated[ | ||
bool, | ||
typer.Option( | ||
help=( | ||
"Only used when --overwrite is given. Allows v3 metadata to be removed when no valid " | ||
"v2 metadata exists at the output location." | ||
) | ||
), | ||
] = False, | ||
remove_v2_metadata: Annotated[ | ||
bool, | ||
typer.Option( | ||
help="Remove v2 metadata (if any) from the output location, after migration is complete." | ||
), | ||
] = False, | ||
) -> None: | ||
"""Migrate all v2 metadata in a zarr hierarchy to v3. This will create a zarr.json file for each level | ||
(every group / array). v2 files (.zarray, .zattrs etc.) will be left as-is. | ||
""" | ||
if dry_run: | ||
_set_verbose_level() | ||
logger.info( | ||
"Dry run enabled - no new files will be created or changed. Log of files that would be created on a real run:" | ||
) | ||
|
||
input_zarr_store = sync(make_store(input_store, mode="r+")) | ||
|
||
if output_store is not None: | ||
output_zarr_store = sync(make_store(output_store, mode="w-")) | ||
write_store = output_zarr_store | ||
else: | ||
output_zarr_store = None | ||
write_store = input_zarr_store | ||
|
||
if overwrite: | ||
sync(migrate_metadata.remove_metadata(write_store, 3, force=force, dry_run=dry_run)) | ||
|
||
migrate_metadata.migrate_v2_to_v3( | ||
input_store=input_zarr_store, output_store=output_zarr_store, dry_run=dry_run | ||
) | ||
|
||
if remove_v2_metadata: | ||
# There should always be valid v3 metadata at the output location after migration, so force=False | ||
sync(migrate_metadata.remove_metadata(write_store, 2, force=False, dry_run=dry_run)) | ||
|
||
|
||
@app.command() # type: ignore[misc] | ||
def remove_metadata( | ||
zarr_format: Annotated[ | ||
ZarrFormat, | ||
typer.Argument(help="Which format's metadata to remove - v2 or v3."), | ||
], | ||
store: Annotated[ | ||
str, | ||
typer.Argument( | ||
help="Store or path to directory in file system or name of zip file e.g. 'data/example-1.zarr', 's3://example-bucket/example'..." | ||
), | ||
], | ||
force: Annotated[ | ||
bool, | ||
typer.Option( | ||
help=( | ||
"Allow metadata to be deleted when no valid alternative exists e.g. allow deletion of v2 metadata, " | ||
"when no v3 metadata is present." | ||
) | ||
), | ||
] = False, | ||
dry_run: Annotated[ | ||
bool, | ||
typer.Option( | ||
help="Enable a dry-run: files that would be deleted are logged, but no files are removed or changed." | ||
), | ||
] = False, | ||
) -> None: | ||
"""Remove all v2 (.zarray, .zattrs, .zgroup, .zmetadata) or v3 (zarr.json) metadata files from the given Zarr. | ||
Note - this will remove metadata files at all levels of the hierarchy (every group and array). | ||
""" | ||
if dry_run: | ||
_set_verbose_level() | ||
logger.info( | ||
"Dry run enabled - no files will be deleted or changed. Log of files that would be deleted on a real run:" | ||
) | ||
input_zarr_store = sync(make_store(store, mode="r+")) | ||
|
||
sync( | ||
migrate_metadata.remove_metadata( | ||
store=input_zarr_store, | ||
zarr_format=cast(Literal[2, 3], int(zarr_format[1:])), | ||
force=force, | ||
dry_run=dry_run, | ||
) | ||
) | ||
|
||
|
||
@app.callback() # type: ignore[misc] | ||
def main( | ||
verbose: Annotated[ | ||
bool, | ||
typer.Option( | ||
help="enable verbose logging - will print info about metadata files being deleted / saved." | ||
), | ||
] = False, | ||
) -> None: | ||
""" | ||
See available commands below - access help for individual commands with zarr COMMAND --help. | ||
""" | ||
_set_logging_config(verbose=verbose) | ||
|
||
|
||
if __name__ == "__main__": | ||
app() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from zarr.metadata.migrate_v3 import migrate_to_v3, migrate_v2_to_v3, remove_metadata | ||
|
||
__all__ = ["migrate_to_v3", "migrate_v2_to_v3", "remove_metadata"] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deliberate that this is a new logger, instead of importing the logger object from
zarr
? I don't tihnk it matters too much, but re-usingzarr._logger
might save some code duplication because you could remove functions from this file for configuring the logger.