Skip to content

Fix specifying memory order in v2 arrays #2951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

dstansby
Copy link
Contributor

@dstansby dstansby commented Apr 3, 2025

Fixes #2950 - still needs some tests, but opening to see if this breaks any other existing tests first.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 3, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Apr 3, 2025

this looks good! and my explanation from #2950 was wrong -- this has nothing to do with the global config, it's just buggy code (that i think I wrote).

@dstansby
Copy link
Contributor Author

dstansby commented Apr 3, 2025

So I updated the test, and it now passes a bit more. But the check that the actual array data is F-order now fails, so it looks like the data still isn't (and never was) actually being written in F-order. More digging needed...

Comment on lines -197 to -202
@pytest.mark.parametrize("array_order", ["C", "F"])
@pytest.mark.parametrize("data_order", ["C", "F"])
@pytest.mark.parametrize("memory_order", ["C", "F"])
def test_v2_non_contiguous(
array_order: Literal["C", "F"], data_order: Literal["C", "F"], memory_order: Literal["C", "F"]
) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanrz you originally wrote this test in #2679, but I don't understand why there are three different types of order, and there's no documentation, or even comments on this test explaining.

I've therefore chosen to be a little bit bold, and for at least v2 data this PR now assumes there is only one order. If we need to deal with different types of orders, I'm certainly open to that, but that increased complexity probably needs to come with some very good user docs, given in zarr-python v2 there was only one type of memory order.

Since you originally wrote this test, I wanted to see what you thought about this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_order = order in the metadata
data_order = order of the numpy array that is used to write
memory_order = order in the runtime config which is used for reading data, ie the order of the output numpy array

At least the first two are necessary for this test. If you want to change the behavior to always read in array_order, then you can drop the memory_order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but don't we just need to parametrize with one order? We pass that one order parameter to the array creation routine, and then check it's correct in both the metadata and the order the data is stored in memory, right (which is how the test works currently in this PR)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this: You create a Zarr array with F order. Next you create a numpy array with C order. You write the numpy array in the Zarr array. This test makes sure that the data on-disk is in F order. That's why you need at least 2 orders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, that makes sense - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Saving and re-opening a v2 array does not report (or save) correct memory order
3 participants