Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] support for object arrays #2617

Open
jhamman opened this issue Jan 2, 2025 · 6 comments
Open

[v3] support for object arrays #2617

jhamman opened this issue Jan 2, 2025 · 6 comments

Comments

@jhamman
Copy link
Member

jhamman commented Jan 2, 2025

Zarr-Python 2 supported object arrays. This functionality has not made it into Zarr-Python 3 yet (in part because there is not an obvious way to develop a v3 dtype for arbitrary Python objects).

An example demonstrating this functionality using Zarr-Python 2:

z = zarr.empty(5, dtype=object, object_codec=numcodecs.JSON())
z[0] = 42
z[1] = 'foo'
z[2] = ['bar', 'baz', 'qux']
z[3] = {'a': 1, 'b': 2.2}
z[:]
array([42, 'foo', list(['bar', 'baz', 'qux']), {'a': 1, 'b': 2.2}, None], dtype=object)

This issue tracks the development of object array support in Zarr-Python 3.

@basnijholt
Copy link

basnijholt commented Jan 9, 2025

I ran into this issue in pipefunc/pipefunc#523

in part because there is not an obvious way to develop a v3 dtype for arbitrary Python objects

Do you expect that object arrays will be supported at some early v3.* release?

@d-v-b
Copy link
Contributor

d-v-b commented Jan 9, 2025

my main concern with the object dtype is the danger associated with using pickle, or any other encoding of python objects that could result in arbitrary code execution. but I don't think we have reached a formal decision on object arrays in v3.

@basnijholt
Copy link

basnijholt commented Jan 9, 2025

I am aware of that limitation/issue. In pipefunc we register our codec that uses cloudpickle.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 10, 2025

so we chatted about this in the developer meeting, the conclusion was that supporting object dtype arrays directly is not in-scope for zarr-python 3.x, because of security concerns inherent to storing arbitrary python objects, and our commitment to keep zarr a format that's accessible to a wide range of languages.

that being said, we would be interested in identifying how zarr-python 3.x could be extended in a third party library to add features like an object dtype. Our dtypes today are not extensible, but I think this could be fixed, but this would require some design work first. Is that process something you would be interested in?

@mavaylon1
Copy link

@d-v-b Hi there. The NWB team would be interested in the idea being able to extend zarr-python to add object dtype. We would also be happy to work on this. You mentioned that currently we are not able to extend to create new dtypes?

@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

correct, we haven't put together an API for user-defined dtypes in zarr python 3 yet. We definitely intend to add this feature, and we have a very promising proposal here: #2750. But I can't give you a definite timeline for when this feature would be released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants