-
-
Notifications
You must be signed in to change notification settings - Fork 325
Make create_array
signatures consistent
#2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Make create_array
signatures consistent
#2819
Conversation
worth noting that this changes some defaults (we were using a default fill_value of |
|
That works for me. In |
…x/consistent-create-array-signature
with 642272d the default fill value is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for this in general - I left one request for a more verbose changelog entry, and one question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changelog update, looks great to me. I left one more suggestion - I did a bit of testing and this doesn't seem to change the fill value that's set on the arrays (which is good, but I think worth reassuring users)
how did you check this? because for zarr v2 data, it should make a difference -- |
I didn't check for v2 🙈 . If it does make a difference, that should be made very clear in the changelog (and we should think about putting this PR in a non-bugfix release since it's a breaking change?) |
I'd hope that since the below PRs, the metadata of the Zarr V2 data with |
@LDeakin that was my impression as well. I think for v2 data the contents of the array metadata will be sensitive to the |
Indeed! To be honest, I think |
The nullable fill value is also clunky because zarr v2 supports the python We would have to get a measure of the impact on users, but if it doesn't inconvenience a lot of people then I would definitely support dropping the creation of arrays with a |
@@ -521,7 +521,7 @@ async def test_consolidated_metadata_v2(self): | |||
dtype=dtype, | |||
attributes={"key": "a"}, | |||
chunks=(1,), | |||
fill_value=0, | |||
fill_value=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, this PR is a breaking one for writing v2 data, which changes what is written as the fill value if the user doesn't specify it? And this changed test is the manifestation of that breaking change? Is this something we want to avoid breaking?
(sorry for all the questions, just trying to understand 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a change to how v2 metadata is written. the actual array data will be written as before. Because the signatures of the affected functions are inconsistent, there is no way to achieve the goal of this PR without changing the default behavior of some of those functions, which will result in different metadata being written to disk as an effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so to your questions: it's breaking if someone relied on the default metadata document produced by some, but not all, of our array creation routines. And no, we do not want to avoid breaking, because the cost of a confusing, inconsistent API is worth paying in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to all that. In which case this is a backwards incompatible API change, and we promise to increment the major version number for backwards incompatible API changes. So, what do we want to do - bump the version number v4, or re-write the versioning policy?
I'm guessing it's the latter... which is a pain, but I think we should be honest and transparent to users if we're going to put backwards incompatible API changes in minor releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should adjust the versioning policy, and also release the changes in this PR in a minor (non-patch) release. We were already planning on making breaking changes without incrementing the major version number -- deprecated functions like create
were slated to be removed long before a 4.x release. So I think the versioning policy needs updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be a bit of a pain, but I'm going to put a request changes on this until we've resolved how we're going to bump the version number (and/or update our versioning policy) for this change. See #2819 (comment) for more context.
that works for me! would you like to open an issue about the versioning policy, or should I? |
I can 👍 |
This PR ensures that the various invocations of
create_array
are consistent. For reference, we have 4 ways to callcreate_array
:zarr.core.array.create_array
(the actual function that does stuff)zarr.api.synchronous.create_array
(synchronous wrapper around the async function)zarr.core.group.AsyncGroup.create_array
(method onAsyncGroup
class that invokescreate_array
)zarr.core.group.Group.create_array
(synchronous wrapper around the AsyncGroup method)All of these functions should have consistent signatures, but in main they don't, and a big part of this is missing tests. So this PR adds some tests that check that certain pairs of functions have identical parameters (we can't force the return types to match, because the async functions will return coroutines). To make the tests pass, this PR also ensures that all of the invocations of
create_array
have parameters that are consistent.I say "consistent" because, when invoking
Group.create_array
, that method does not take astore
argument, because we have one already from the group instance. Also,Group.create_array
was recently given an extra keyword argument (compressor
) that we need to deprecate. So the test that compareszarr.core.array.create_array
withzarr.core.group.AsyncGroup.create_array
only checks that all ofcreate_array
parameters are present in the group method.closes #2810