-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the ambiguity of .shape
behavior
#891
Comments
.shape
behavior.shape
behavior
@ricardoV94 thanks for the questions and idea. I'd like to try to get some clarity on the actual problem first. Here is a bit of context and some thoughts:
I assume you meant In general, lazy implementations have more limitations than eager ones, like you cannot use functions from the stdlib most of the time. That's not specific to the standard though. The standard is carefully designed to not require eager behavior unless it absolutely cannot be avoided - and those few parts have warnings about value-dependent behavior. The most annoying one is
I hope the above makes clear that this is not a case that happens in the real world, since static shapes are never unknown. Are you running into an actual problem using or implementing |
The point was that this complicates writing code that operates on Now this is fine within a library because I'm allowed to define x.shape as I want. But then what about meta-libraries that want to implement their own version of reshape? They would need to know if the library is going to do the first sort of shape or the second. So they cannot be backend agnostic. Am I miusnderstanding the scope of the project? |
Or put another way why would anyone implement x.shape as a tuple with None in their library? Is anyone doing it /interested in that format? |
This is just not true? It always works eagerly because static shapes are known, and it always works lazily because
Yes, that's a misunderstanding, unless I'm misunderstanding what you are saying - one of the key goals of this whole effort is to allow libraries to write code that's agnostic to the library and execution model that's backing the input arrays.
I don't quite understand this question, so I'll answer the below.
It's only |
I'm not that familiar with PyTensor so there's a chance there is something I am missing that's behind your questions. There's also a lot of history here. We are rapidly gaining more experience with lazy libraries and their strengths and limitations when used through the standard, e.g. adding support for JAX and Dask in SciPy and scikit-learn. I'm happy to set up a call if you prefer and talk it through? |
I guess my question is, how do I decide which format to offer? Well it's easy to answer that because if I want But importantly for me, will another library ever look for For a concrete example, when adding the PyTensor backend to einops, we implemented I had to tell the library how to do that specifically the PyTensor backend (there's something similar for non-eager TF above). I guess for dask the equivalent would be to call No idea how the JAX case can be used from the outside. Maybe the point is that without the standard, a meta-library like einops will have to figure out which backend it is if they want to make eager decisions on lazy graphs? That's why I feel this may be connected to #839 although it's about shape and not values, which is a simpler case? |
I'm sure we're both missing something (me more) :) Feel free to reach out to me |
I'd say put in the actual values if you have them, and
From what I've seen, this is only done when an algorithm has inherently value-dependent behavior, so there is no way to keep things lazy. E.g.: if unique(x).shape[0] < 5:
small_size_algo(...)
else:
regular_algo(...) Scikit-learn has a fair amount of code like that for example, often using These cases are very hard to support for lazy arrays, and that's more the problem than whether you hit the "must compute or raise" point in
done! |
Dask maintainer here. Dask's unknown shapes predate the Array API standard by many years. Regrettably, Dask uses a non-standard >>> import dask.array as da
>>> a = da.random.random((5, 10))
>>> a[(a > .5).any(axis=-1)]
dask.array<getitem_variadic, shape=(nan, 10), dtype=float64, chunksize=(nan, 10), chunktype=numpy.ndarray> This non-conformity (which can't be fixed easily in dask as it would be a breaking change) has caused a lot of bugs in array-api-compat and array-api-extras. I personally prefer
This verbiage feels written with xarray in mind, where it would make perfect sense for It could also be used by a lazy backend to convey size hints or constraints, obfuscated from the Array API. For example (hypothetical, no library does it today): >>> a[(a > .5).any(axis=-1)]
(undefined size in range [0, 5], 10) I agree the verbiage is problematic, but not for the reasons you mention. It's the vagueness of "similarly to a tuple": it doesn't state that
Will another library ever support lazy backends, which are explicitly catered for by the Array API? I've been thinking for a while about writing a variant of |
Thinking about it more, this is very problematic for any function that wishes to accept "a shape or sequence of shapes", as it makes it easy to sneak a bug into the code that identifies which of the two the user provided, which only crops up on some esoteric backend. Namely this is what I have now in a function I'm writing, which could fail under this provision: def f(..., shape: tuple[int | None, ...] | Sequence[tuple[int | None, ...]]):
if isinstance(shape, tuple) and all(isinstance(s, int | None) for s in shape):
shapes = [shape] # pyright: ignore[reportAssignmentType]
else:
shapes = list(shape) Would it be reasonable to change it to strict subclasses of |
Not just Xarray. The more common need that was brought up early on in design discussions was "how can we keep things on GPU and avoid synchronization". The answer for all Python objects typically was "you can keep it on GPU and duck type it". |
Why don't you provide an abstract base class or protocol and then demand the base class in the standard? Or is the ABC just |
|
Looks like RedKnot will support intersections (from reading pull requests), but yes, it's not in the typing standard yet. Anyway, they can just rewrite their wording to "abc.Sequence & Hashable" if that's what they mean. |
Probably, maybe, yes. I think it is feasible in principle for >>> import torch
>>> t = torch.tensor([[0, 1, 2], [0, 1, 2]])
>>> t.shape
torch.Size([2, 3])
>>> type(t.shape)
<class 'torch.Size'>
>>> isinstance(t.shape, tuple)
True |
One important difference is that, as a subclass of tuple, one would no longer be able to auto-materialize the shape when the user queries it from C - only from Python. |
According to #97 a library can decide to either return a
tuple[int | None, ...]
or a tuple-like object that:This seems like a recipe for disaster? The second option allows to operate on shape graphs, whereas the first would fail when you try to act on
None
, say to find the size of some dimensions by doingprod(x.shape[1:])
(forced example so that.size
wouldn't be applicable).In PyTensor we have the distinction between
variable.shape
andvariable.type.shape
, that correspond to those two kinds of output. They are flipped though, and it seems odd to makevariable.shape
return a tuple withNone
. It doesn't make sense to build a computation on top of static shape, because thoseNone
are not linked to anything.Besides that, we sometimes also allow users to replace variables with different static shapes, although it's arguable a bit of an undefined behavior. It seems to contradict the specification that it must be immutable, so happy to say it's out of scope:
Proposal
Would make sense to separate the two kinds of shape clearly? Perhaps as
variable.shape
andvariable.static_shape
. The first should be valid to build computations on top of variable shapes, statically known or not, while the second would allow libraries to reason as much as possible about what is known (and choose to fail if the provided information is insufficient) without having to try and probe which kind of shape output is returned by a specific library.This is somewhat related to #839, where a library may need as much information as possible to make a decision. Perhaps a
static_value
would also make sense for a library to return the entries that can be known ahead of time. Anyway that should be discussed there.If both options make sense, I would argue that
.shape
should behave like pytensor does.The standard should also specify if
library.shape(x)
should matchx.shape
orx.static_shape
. Again I think it should match the first.The text was updated successfully, but these errors were encountered: