Skip to content

Commit c714d2b

Browse files
jstriebelgrlee77joshmoore
authored
Sharding storage transformer for v3 (#1111)
* add storage_transformers and get/set_partial_values * formatting * add docs and release notes * add test_core testcase * Update zarr/creation.py Co-authored-by: Gregory Lee <[email protected]> * apply PR feedback * add comment that storage_transformers=None is the same as storage_transformers=[] * use empty tuple as default for storage_transformers * make mypy happy * better coverage, minor fix, adding rmdir * add missing rmdir to test * increase coverage * improve test coverage * fix TestArrayWithStorageTransformersV3 * Update zarr/creation.py Co-authored-by: Gregory Lee <[email protected]> * add sharding storage transformer * add actual transformer * fixe, and allow partial reads for uncompressed v3 arrays * pick generic storage transformer changes from #1111 * increase coverage * make lgtm happy * add release note * better coverage * fix hexdigest * improve tests * fix order of storage transformers * fix order of storage transformers * retrigger CI * minor test improvement * minor test update * apply PR feedback * minor fixes * make flake8 happy * call ensure_bytes in sharding transformer * minor fixes * apply PR feedback * adapt to supports_efficient_get_partial_values property * add ZARR_V3_SHARDING flag for sharding usage * fix release notes * fix release notes --------- Co-authored-by: Gregory Lee <[email protected]> Co-authored-by: Josh Moore <[email protected]>
1 parent 6f11ae7 commit c714d2b

File tree

11 files changed

+623
-24
lines changed

11 files changed

+623
-24
lines changed

.github/workflows/minimal.yml

+2
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ jobs:
2424
shell: "bash -l {0}"
2525
env:
2626
ZARR_V3_EXPERIMENTAL_API: 1
27+
ZARR_V3_SHARDING: 1
2728
run: |
2829
conda activate minimal
2930
python -m pip install .
@@ -32,6 +33,7 @@ jobs:
3233
shell: "bash -l {0}"
3334
env:
3435
ZARR_V3_EXPERIMENTAL_API: 1
36+
ZARR_V3_SHARDING: 1
3537
run: |
3638
conda activate minimal
3739
rm -rf fixture/

.github/workflows/python-package.yml

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ jobs:
7070
ZARR_TEST_MONGO: 1
7171
ZARR_TEST_REDIS: 1
7272
ZARR_V3_EXPERIMENTAL_API: 1
73+
ZARR_V3_SHARDING: 1
7374
run: |
7475
conda activate zarr-env
7576
mkdir ~/blob_emulator

.github/workflows/windows-testing.yml

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ jobs:
5252
env:
5353
ZARR_TEST_ABS: 1
5454
ZARR_V3_EXPERIMENTAL_API: 1
55+
ZARR_V3_SHARDING: 1
5556
- name: Conda info
5657
shell: bash -l {0}
5758
run: conda info

docs/release.rst

+7-4
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,16 @@ Unreleased
1414
# .. warning::
1515
# Pre-release! Use :command:`pip install --pre zarr` to evaluate this release.
1616
17-
1817
Major changes
1918
~~~~~~~~~~~~~
2019

21-
* Improve `Zarr V3 support <https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html>`_
22-
adding partial store read/write and storage transformers.
23-
By :user:`Jonathan Striebel <jstriebel>`; :issue:`1096`.
20+
* Improve Zarr V3 support, adding partial store read/write and storage transformers.
21+
Add two features of the [v3 spec](https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html):
22+
* storage transformers
23+
* `get_partial_values` and `set_partial_values`
24+
* efficient `get_partial_values` implementation for `FSStoreV3`
25+
* sharding storage transformer
26+
By :user:`Jonathan Striebel <jstriebel>`; :issue:`1096`, :issue:`1111`.
2427

2528

2629
Bug fixes

zarr/_storage/v3.py

+29
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,35 @@ def rmdir(self, path=None):
182182
if self.fs.isdir(store_path):
183183
self.fs.rm(store_path, recursive=True)
184184

185+
@property
186+
def supports_efficient_get_partial_values(self):
187+
return True
188+
189+
def get_partial_values(self, key_ranges):
190+
"""Get multiple partial values.
191+
key_ranges can be an iterable of key, range pairs,
192+
where a range specifies two integers range_start and range_length
193+
as a tuple, (range_start, range_length).
194+
range_length may be None to indicate to read until the end.
195+
range_start may be negative to start reading range_start bytes
196+
from the end of the file.
197+
A key may occur multiple times with different ranges.
198+
Inserts None for missing keys into the returned list."""
199+
results = []
200+
for key, (range_start, range_length) in key_ranges:
201+
key = self._normalize_key(key)
202+
path = self.dir_path(key)
203+
try:
204+
if range_start is None or range_length is None:
205+
end = None
206+
else:
207+
end = range_start + range_length
208+
result = self.fs.cat_file(path, start=range_start, end=end)
209+
except self.map.missing_exceptions:
210+
result = None
211+
results.append(result)
212+
return results
213+
185214

186215
class MemoryStoreV3(MemoryStore, StoreV3):
187216

0 commit comments

Comments
 (0)