-
-
Notifications
You must be signed in to change notification settings - Fork 327
DOC: create_array(..., data=,...) #2809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
thanks for this issue @DerWeh. the
I agree that the documentation should say more about this. Basically all of the non-async functions like (like |
Thanks for the clarification. Adding the As far as I know, Python's standard library uses synchronous operations for files. There are, however, libraries like aiofiles (which I haven't tried so far). If I understand you correctly, using such a library as storage backend, we could expect performance improvements? |
Hi @d-v-b 👋, I'd like to help complete the documentation for Proposed Changes:
Is this approach okay? |
that sounds good! note that the docstring changes might already by handled by this pr : #2819, but please weigh in on that PR if you would like to see anything changed there |
@d-v-b 👋, Thanks for pointing me to #2819 ! I've reviewed the changes and see that it addresses parameter consistency for Observations:
Remaining Gaps:
Can I focus on updating
Would this be helpful? |
that would be great, thank you! |
Describe the issue linked to the documentation
I am very confused about the argument
data
increate_array
. A common use case is to simply serialize an in memory array, in which case I tend to pass it as thedata=in_memory_array
argument. However, I cannot find thedata
argument in the documentation.Using IPyhon, on the other hand,
zarr.create_array
clearly has the argument, whilezarr.Group.create_array
doesn't seem to expose the interface. I am quite confused about the discrepancy. If this is intentional, please document it.LLM also suggest that
is more efficient than
I have no idea whether this is true or not.
zarr.create_array(..., data=in_memory_data)
might be indeed more efficient as it seems to be written asynchronously. But the documentation seems to by quite lacking, what the best practice is.This might be a bit out of scope for this issue, this issue, so please tell me if it's out of scope. But from the documentation, I don't really see how to leverage the asynchronous nature of the
zarr
implementation. A common pattern I encounter is, that data is generated in parallel using multiprocessing (as it is CPU bound) and persisted usingzarr
(probably disc bound). Is there a preferred pattern, to usezarr
as an asynchronous sink for the generated data? If so, it would be great to include it in the docs.Suggested fix for documentation
No response
The text was updated successfully, but these errors were encountered: