add numcodec protocol #3318

d-v-b · 2025-07-31T13:46:35Z

This PR adds a protocol to model numcodecs.abc.Codec. The motivation for this protocol is to ensure that we can process external codecs that adhere to the numcodecs API without needing numcodecs as a dependency.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

codecov · 2025-07-31T13:58:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.55%. Comparing base (1264a4d) to head (fcc010b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3318      +/-   ##
==========================================
+ Coverage   94.54%   94.55%   +0.01%     
==========================================
  Files          78       79       +1     
  Lines        9423     9448      +25     
==========================================
+ Hits         8909     8934      +25     
  Misses        514      514

Files with missing lines	Coverage Δ
src/zarr/abc/codec.py	`95.06% <100.00%> (+0.77%)`	⬆️
src/zarr/abc/numcodec.py	`100.00% <100.00%> (ø)`
src/zarr/api/asynchronous.py	`87.62% <ø> (ø)`
src/zarr/api/synchronous.py	`92.95% <ø> (ø)`
src/zarr/codecs/_v2.py	`93.61% <100.00%> (-0.14%)`	⬇️
src/zarr/core/_info.py	`95.18% <100.00%> (ø)`
src/zarr/core/array.py	`97.10% <100.00%> (-0.01%)`	⬇️
src/zarr/core/metadata/v2.py	`91.27% <100.00%> (ø)`
src/zarr/registry.py	`88.81% <100.00%> (+0.23%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b · 2025-08-01T16:41:17Z

The way our code coverage works, class and function definitions are not counted as tested if they are imported by pytest prior to running actual tests. Pytest imports from a lot of our library due to fixtures defined in conftest.py, which is why our coverage reports in CI report function definitions as uncovered.

This means adding new functions always reduces coverage, because the function definition is uncovered. ~~Until we fix how our coverage is measured, I excluded instances of NotImplementedError with #pragma: no cover to get coverage green for this PR.~~

dstansby · 2025-08-05T11:59:38Z

tests/test_api.py

@@ -1282,7 +1282,7 @@ def test_gpu_basic(store: Store, zarr_format: ZarrFormat | None) -> None:
            dtype=src.dtype,
            overwrite=True,
            zarr_format=zarr_format,
-            compressors=compressors,
+            compressors=compressors,  # type: ignore[arg-type]


There's a lot of type: ignore comments currently dotted throughout - what's the reason for that? Is it perhaps because numcodecs.abc.Codec doesn't currently conform to the new protocol?

this particular example is a test, and if you look a few lines above you will see that compressors is either None or the string "auto", and mypy models this as str | None, leading to this error:

tests/test_api.py:1285: error: Argument "compressors" to "create_array" has incompatible type "str | None"; expected "CompressorsLike" [arg-type]

I suspect the other type: ignore statements were added for different reasons

d-v-b · 2025-08-05T13:00:39Z

Is it perhaps because numcodecs.abc.Codec doesn't currently conform to the new protocol?

Spinning this question out into the main thread: numcodecs.abc.Codec does not have type annotations for its methods, and even if it did, decode would need to be annotated as returning object thanks to Pickle.decode, which can return literally any python object.

without type annotations on numcodecs.abc.Codec, I don't think we can use a type checker to check if a numcodecs codec is assignable to Numcodec, but this PR does add runtime checks, which are tested.

dstansby · 2025-08-05T13:12:17Z

To make sure I understand, does passing an existing numcodecs codec to compressors or filters fail type checking in user code with this PR (because they don't conform to the new protocol)?

martindurant · 2025-08-05T13:26:12Z

even if it did, decode would need to be annotated as returning object thanks to Pickle.decode

Shouldn't it annotate our expectations, but allow for the (one?) exception to mark itself as non compliant? We probably don't want any other codecs making Object, and this is a way to say that. In fact, it raises the other question, for a different discussion: is having the pickle protocol reasonable?

d-v-b · 2025-08-05T13:30:50Z

Shouldn't it annotate our expectations, but allow for the (one?) exception to mark itself as non compliant? We probably don't want any other codecs making Object, and this is a way to say that.

Yes, which is why in this PR I am annotating the encode and decode methods as returning Buffer | NDBuffer.

In fact, it raises the other question, for a different discussion: is having the pickle protocol reasonable?

It might be reasonable for some applications (it was evidently reasonable enough to get implemented in the first place). It's also incredibly dangerous, because it performs arbitrary code execution.

…e needed

… annotations

d-v-b · 2025-08-05T14:43:01Z

To make sure I understand, does passing an existing numcodecs codec to compressors or filters fail type checking in user code with this PR (because they don't conform to the new protocol)?

this is a great question, and the answer depends on which type checker you ask.

if we take this script:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr"
# ]
# ///

import zarr

from numcodecs import GZip
import numcodecs
from zarr.abc.numcodec import Numcodec

a: Numcodec = GZip()
b: numcodecs.abc.Codec = GZip()

mypy reports no problems:

bennettd@dvb-desktop-0 ➜  zarr-python git:(feat/numcodecs-protocol) ✗ uvx mypy test_3.py
Success: no issues found in 1 source file

but the more thorough based-pyright type checker complains about both a and b, for different reasons:

Installed 2 packages in 126ms
/home/bennettd/dev/zarr-python/test_3.py
  /home/bennettd/dev/zarr-python/test_3.py:8:8 - warning: Import "zarr" is not accessed (reportUnusedImport)
  /home/bennettd/dev/zarr-python/test_3.py:10:6 - warning: Stub file not found for "numcodecs" (reportMissingTypeStubs)
  /home/bennettd/dev/zarr-python/test_3.py:11:8 - warning: Stub file not found for "numcodecs" (reportMissingTypeStubs)
  /home/bennettd/dev/zarr-python/test_3.py:14:15 - error: Type "GZip" is not assignable to declared type "Numcodec"
    "GZip" is incompatible with protocol "Numcodec"
      "encode" is an incompatible type
        Type "(buf: Unknown) -> bytes" is not assignable to type "(buf: Buffer | NDBuffer) -> (Buffer | NDBuffer)"
          Function return type "bytes" is incompatible with type "Buffer | NDBuffer"
            Type "bytes" is not assignable to type "Buffer | NDBuffer"
      "decode" is an incompatible type
        Type "(buf: Unknown, out: Unknown | None = None) -> (Unknown | bytes)" is not assignable to type "(buf: Buffer | NDBuffer, out: Buffer | NDBuffer | None = None) -> (Buffer | NDBuffer)"
          Function return type "Unknown | bytes" is incompatible with type "Buffer | NDBuffer"
    ... (reportAssignmentType)
  /home/bennettd/dev/zarr-python/test_3.py:15:4 - warning: Type of "abc" is unknown (reportUnknownMemberType)
  /home/bennettd/dev/zarr-python/test_3.py:15:4 - warning: Type of "Codec" is unknown (reportUnknownMemberType)
  /home/bennettd/dev/zarr-python/test_3.py:15:14 - error: "abc" is not a known attribute of module "numcodecs" (reportAttributeAccessIssue)
2 errors, 5 warnings, 0 notes

so, tldr is that the status quo is broken because numcodecs doesn't support type checking, AND the new protocol is using a too-narrow type. i'll look into fixing this, but it's important to keep in mind that the status quo is broken.

edit: changing the numcodecs annotation from numcodecs.abc.Codec to Codec, after importing Codec from numcodecs.abc does pass pyright. but we never use that import in our codebase.

…e needed

… annotations

…n into feat/numcodecs-protocol

d-v-b · 2025-08-05T17:22:47Z

as of 82992c5 I'm using Any for the input / output types of Numcodec. We can narrow this type as needed later on.

…at/numcodecs-protocol

TomAugspurger · 2025-08-08T14:17:50Z

Something discussed briefly on the dev call today: I'd love if we could avoid the checks (isinstance on main, the custom is_*` checks on this PR).

As @d-v-b mentioned, we'd ideally have some validation at the outer layer. Once we're past that, we know we have alid metadata, so we can get from there to (what should be) a valid python codec class. At some point we should be able to avoid these checks.

But we do this already on main so fixing that here is out of scope for this PR.

maxrjones

I made some very minor suggestions around providing docstrings since the typing is now so broad.

Mostly, I'm confused about the protocol not aligning with the Zarr V3 specification. https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#id22 states:

Logically, a codec c must define three properties:

c.compute_encoded_representation_type(decoded_representation_type), a procedure that determines the encoded representation based on the decoded representation and any codec parameters. In the case of a decoded representation that is a multi-dimensional array, the shape and data type of the encoded representation must be computable based only on the shape and data type, but not the actual element values, of the decoded representation. If the decoded_representation_type is not supported, this algorithm must fail with an error.

c.encode(decoded_value), a procedure that computes the encoded representation, and is used when writing an array.

c.decode(encoded_value, decoded_representation_type), a procedure that computes the decoded representation, and is used when reading an array

It doesn't seem like the V2 spec provided any definitions for a codec. Will external codec developers follow the numcodec protocol or the Zarr V3 specification?

src/zarr/abc/numcodec.py

src/zarr/abc/store.py

Co-authored-by: Max Jones <[email protected]>

d-v-b · 2025-08-10T18:08:31Z

Mostly, I'm confused about the protocol not aligning with the Zarr V3 specification. https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#id22 states:

This is deliberate. This protocol is a model of a thing that already exists (the numcodecs codec ABC). The numcodecs codec ABC is not a v3 codec, so neither is this model. There is no intention of using the Numcodec protocol as a v3-style codec in zarr python. Instead, in a later PR I will define a NumcodecWrapper class that is a v3 codec. This class will wrap the methods defined on the Numcodec protocol in a v3-codec-compatible way.

It doesn't seem like the V2 spec provided any definitions for a codec. Will external codec developers follow the numcodec protocol or the Zarr V3 specification?

Although the v2 spec does define the JSON form of a codec, it does not define the codec API. But the protocol I'm adding in this PR is narrowly scoped to modeling a particular API in numcodecs, not anything in the v2 spec (even though that numcodecs API does implement the v2 spec).

d-v-b · 2025-08-10T18:21:13Z

Will external codec developers follow the numcodec protocol or the Zarr V3 specification?

Reminder that we are adding this protocol for compatibility with existing codecs that were written for zarr python 2.x. Developers working on new codecs that target zarr python 3.x should almost certainly use the v3-style codec API defined in zarr.abc.codec.Codec.

normanrz · 2025-08-11T15:32:02Z

src/zarr/abc/numcodec.py

+from typing_extensions import Protocol
+
+
+class Numcodec(Protocol):


Could this work with https://docs.python.org/3/library/typing.html#typing.runtime_checkable?

I tried :( but I learned that @runtime_checkable doesn't work for protocols with non-methods (i.e., attributes), and the numcodecs.abc.Codec ABC uses codec_id as an attribute. I'm not 100% sure we need the codec_id here, but if we ever wanted to register these codecs, then it would be important.

…n into feat/numcodecs-protocol

add numcodec protocol

a367268

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 31, 2025

d-v-b added 2 commits July 31, 2025 23:25

add tests for numcodecs compatibility

1d424c0

changelog

41dd6ff

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 31, 2025

d-v-b added 3 commits July 31, 2025 23:43

ignore unknown key

c435a59

remove re-implementation of get_codec

8e50ef8

Merge branch 'main' into feat/numcodecs-protocol

ef31c5b

d-v-b force-pushed the feat/numcodecs-protocol branch from e5ffc33 to ef31c5b Compare August 1, 2025 19:36

d-v-b added 3 commits August 4, 2025 12:32

Merge branch 'main' into feat/numcodecs-protocol

4ba7914

Merge branch 'main' into feat/numcodecs-protocol

ab52539

Merge branch 'main' into feat/numcodecs-protocol

95c9c8b

d-v-b mentioned this pull request Aug 4, 2025

Add v2 and v3 metadata support to codecs #3332

Draft

dstansby mentioned this pull request Aug 4, 2025

Mysterious code coverage drop #3333

Closed

d-v-b mentioned this pull request Aug 4, 2025

Imagecodecs support NASA-IMPACT/veda-odd#214

Open

Merge branch 'main' into feat/numcodecs-protocol

fcf84b3

d-v-b requested a review from a team August 4, 2025 16:45

dstansby mentioned this pull request Aug 5, 2025

Add Blosc2 codec to zarr-python HEFTIEProject/zarr-tooling#6

Open

dstansby reviewed Aug 5, 2025

View reviewed changes

Merge branch 'main' into feat/numcodecs-protocol

5b0c3ac

d-v-b added 3 commits August 5, 2025 16:15

avoid circular imports by importing lower-level routines exactly wher…

84c9780

…e needed

push numcodec prototol into abcs; remove all numcodecs.abc.Codec type…

9a2f35b

… annotations

add tests for codecjson typeguard

0d0712f

d-v-b added 11 commits August 5, 2025 19:14

add numcodec protocol

f06c6aa

add tests for numcodecs compatibility

b71e8ac

changelog

bcaa9ee

ignore unknown key

7e49f39

remove re-implementation of get_codec

4b53f5d

avoid circular imports by importing lower-level routines exactly wher…

b35e6c9

…e needed

push numcodec prototol into abcs; remove all numcodecs.abc.Codec type…

deef94a

… annotations

add tests for codecjson typeguard

f057525

avoid using zarr's buffer / ndbuffer for numcodec encode / decode

190e1b2

use Any to model input / output types of numcodec protocol

82992c5

Merge branch 'feat/numcodecs-protocol' of github.com:d-v-b/zarr-pytho…

7ea7e91

…n into feat/numcodecs-protocol

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

413573a

…at/numcodecs-protocol

d-v-b mentioned this pull request Aug 6, 2025

Add CLI for converting v2 metadata to v3 #3257

Open

6 tasks

d-v-b added 2 commits August 6, 2025 12:22

Merge branch 'main' into feat/numcodecs-protocol

cee4389

Merge branch 'main' into feat/numcodecs-protocol

76f666c

maxrjones reviewed Aug 10, 2025

View reviewed changes

d-v-b and others added 5 commits August 10, 2025 19:49

Update src/zarr/abc/numcodec.py

c86be01

Co-authored-by: Max Jones <[email protected]>

Update src/zarr/abc/numcodec.py

dba39f5

Co-authored-by: Max Jones <[email protected]>

Update src/zarr/abc/numcodec.py

a857fc2

Co-authored-by: Max Jones <[email protected]>

Update src/zarr/abc/numcodec.py

a082222

Co-authored-by: Max Jones <[email protected]>

Update src/zarr/abc/numcodec.py

ccaaa65

Co-authored-by: Max Jones <[email protected]>

normanrz reviewed Aug 11, 2025

View reviewed changes

d-v-b added 4 commits August 13, 2025 10:39

Merge branch 'feat/numcodecs-protocol' of github.com:d-v-b/zarr-pytho…

c1991e4

…n into feat/numcodecs-protocol

fix docstrings

bb28d1d

revert changes to store imports

eedea84

remove whitespace

fcc010b

		from typing_extensions import Protocol


		class Numcodec(Protocol):

Uh oh!

add numcodec protocol #3318

Are you sure you want to change the base?

add numcodec protocol #3318

Conversation

d-v-b commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

d-v-b commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dstansby Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Aug 5, 2025

Uh oh!

dstansby commented Aug 5, 2025

Uh oh!

martindurant commented Aug 5, 2025

Uh oh!

d-v-b commented Aug 5, 2025

Uh oh!

d-v-b commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Aug 5, 2025

Uh oh!

TomAugspurger commented Aug 8, 2025

Uh oh!

maxrjones left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-v-b commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

normanrz Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

d-v-b commented Jul 31, 2025 •

edited

Loading

codecov bot commented Jul 31, 2025 •

edited

Loading

d-v-b commented Aug 1, 2025 •

edited

Loading

d-v-b commented Aug 5, 2025 •

edited

Loading

d-v-b commented Aug 10, 2025 •

edited

Loading

d-v-b commented Aug 10, 2025 •

edited

Loading