Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(types) Numpy.typing.NDArray #5212

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

InvincibleRMC
Copy link
Contributor

@InvincibleRMC InvincibleRMC commented Jun 29, 2024

Description

Switches to using numpy.typing.NDArray for typing annotations over numpy.ndarray. This is because numpy.ndarray is implemented with the first argument for some future size annotation. See numpy/numpy#16544. Since the first argument is for the size annotation the typing of the numpy.ndarray is basically not being used. I also modified the eigen matrix/tensor to numpy.typing.NDArray as well even though stubgen fails to generate stubs since numpy.typing.NDArray[numpy.float64[...]] is not a valid type.

def foo(x: numpy.typing.NDArray[numpy.float64]) -> None:
    reveal_type(x)  # ndarray[Any, dtype[floating[_64Bit]]
    reveal_type(x.dtype)  # dtype[float64]

def bar(x: numpy.ndarray[numpy.float64]) -> None:
    reveal_type(x)  # ndarray[float64, Any]
    reveal_type(x.dtype)  # Any

Suggested changelog entry:

    Switched to `numpy.typing.NDArray and numpy.typing.ArrayLike` 

@rwgk
Copy link
Collaborator

rwgk commented Jun 30, 2024

I don't know enough about typing to meaningfully weigh the pros and cons of changing from ndarray to NDArray. @henryiii could you help with that?

Regarding the tuple[()] change, we have this already:

template <>
struct handle_type_name<typing::Tuple<>> {
// PEP 484 specifies this syntax for an empty tuple
static constexpr auto name = const_name("tuple[()]");
};

Is your change needed anyway? Could you send a separate, small PR for that, including a test that fails without your change?

@InvincibleRMC InvincibleRMC changed the title feat(types) Numpy.typing.NDArray and empty tuple feat(types) Numpy.typing.NDArray Jun 30, 2024
@henryiii
Copy link
Collaborator

NDArray is the public type for NDArrays, and hides the "size" related type (as described above). It also uses a different form for the DType. Just curious, what about changing it to numpy.ndarray[Any, numpy.float64]?

FYI, I get an error on your example:

$ bat tmp.py
───────┬─────────────────────────────────────────
       │ File: tmp.py
───────┼─────────────────────────────────────────
   1   │ import numpy
   2   │ from typing import Any
   3   │
   4   │ v: numpy.typing.NDArray[numpy.float64]
   5   │ reveal_type(v)
   6   │ reveal_type(v.dtype)
   7   │
   8   │ q: numpy.ndarray[Any, numpy.float64]
   9   │ reveal_type(q)
  10   │ reveal_type(q.dtype)
  11   │
  12   │
───────┴─────────────────────────────────────────
$ pipx install mypy
$ pipx inject mypy numpy
$ mypy tmp.py
tmp.py:5: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy._typing._64Bit]]]"
tmp.py:6: note: Revealed type is "numpy.dtype[numpy.floating[numpy._typing._64Bit]]"
tmp.py:8: error: Type argument "floating[_64Bit]" of "ndarray" must be a subtype of "dtype[Any]"  [type-var]
tmp.py:9: note: Revealed type is "numpy.ndarray[Any, numpy.floating[numpy._typing._64Bit]]"
tmp.py:10: note: Revealed type is "numpy.floating[numpy._typing._64Bit]"
Found 1 error in 1 file (checked 1 source file)

@InvincibleRMC
Copy link
Contributor Author

InvincibleRMC commented Aug 12, 2024

In my example I have numpy.ndarray[numpy.float64] which is what was currently being done to shown the error. You have numpy.ndarray[Any, numpy.float64] which is also incorrect. It would need to be numpy.ndarray[Any, numpy.dtype[numpy.float64]]] to create the proper type annotation without numpy.typing. If you would prefer numpy.ndarray[Any, numpy.dtype[numpy.float64]]] I can update it to be that.

@gentlegiantJGC
Copy link
Contributor

The valid options are

  1. numpy.ndarray[Any, numpy.dtype[numpy.float64]]]
  2. numpy.typing.NDArray[numpy.float64]

I think numpy.typing.NDArray is the cleanest way to resolve this issue.

I can't find any documentation for the shape, writeable and contiguous attributes in the type hint. Is this correct?
numpy.typing.NDArray[numpy.float32[5, 6], flags.writeable, flags.f_contiguous]
I don't think these are valid for numpy.ndarray either.

@henryiii
Copy link
Collaborator

Needs update after #5486

@InvincibleRMC
Copy link
Contributor Author

@timohl I have tried some various things like to get io_name working with templated arguments. Is there something obvious I am missing?

    PYBIND11_TYPE_CASTER(type,
                         io_name("numpy.typing.ArrayLike",
                                 "numpy.typing.NDArray["
                                     + npy_format_descriptor<T>::name + "]"));

@timohl
Copy link
Contributor

timohl commented Jan 25, 2025

@timohl I have tried some various things like to get io_name working with templated arguments. Is there something obvious I am missing?

Yes, but maybe not so obvious.
io_name only supports constexpr c-string arguments.
The type of npy_format_descriptor<T>::name is some instance of the template struct descr.
This disallows nesting of io_name which would break things.
Maybe something like this could work:npy_format_descriptor<T>::name.text

I have drafted a working version using io_name with c-strings here: #5502.
With this, I was able to build Open3D (with minor modification), run pybind-stubgen, and run mypy successfully on all example code without errors about numpy types from Eigen. 🚀

We should discuss what kind of numpy type we should use.
In #5502 I used something like this:
For args: typing.Annotated[numpy.typing.ArrayLike, numpy.float32, "[m, 1]", "flags.writeable", "flags.c_contiguous"]
For return: typing.Annotated[numpy.typing.NDArray[numpy.float32], "[m, 1]", "flags.writeable", "flags.c_contiguous"]

I think a combination of typing.Annotated, numpy.typing.ArrayLike and numpy.typing.NDArray would be best to keep all available information while providing valid types.

Open questions:

  • Format of typing.Annotated arguments: I used strings to use "[m, 1]" as shape, since m is usually not defined. Other options might be to use something like nptyping or annotated-types but adding 3rd party dependencies should rather be an option in stubgen, not pybind itself in my opinion
  • Should all flags be shown in return type aswell or only in args? I think they are used to provide better error messages when a user passes an array that does not conform
  • Where to use ArrayLike or NDArray? Right now, I just added it to all args, but I have not completely checked the code of type casters for every Eigen variant if they really accept for example lists as input.

Signed-off-by: Michael Carlstrom <[email protected]>
@InvincibleRMC
Copy link
Contributor Author

@timohl I copied over your changes but, I seem to be getting some errors that have to with tools/FindCatch.cmake. To me they look unrelated but, maybe I am missing something. Any ideas?

@InvincibleRMC
Copy link
Contributor Author

Hmm it seems like maybe it just was a server issue.

@timohl
Copy link
Contributor

timohl commented Feb 10, 2025

Great to hear that it is working now!

One small style question @InvincibleRMC:
Any reason why you are using escaped quotation marks (\" instead of plain ") in multi-line strings?
To my understanding, there is no need to escape, but maybe it is preferable due to another reason?

Also, a note regarding the open question in my last comment:

Format of typing.Annotated arguments: I used strings to use "[m, 1]" as shape, since m is usually not defined. Other options might be to use something like nptyping or annotated-types but adding 3rd party dependencies should rather be an option in stubgen, not pybind itself in my opinion

My suggestion would be to use strings (as implemented in this PR) as long as there is no standard way of defining shapes/properties of numpy arrays.
Right now, the point is to inform the user that the passed array does not conform to certain requirements.
So, until numpy defines a way to specify those, strings should be able to inform without breaking type checking.

Here also an overview of several variants with mypy/pylance/ruff errors.
TLDR: f1 (this PR) reports no errors for signature and call, and contains all available info.

import typing
from typing import Any, Literal, TypeVar

import numpy


# This PR (no errors):
def f1(
    x: typing.Annotated[numpy.typing.ArrayLike, numpy.float64, "[m, 1]", "flags.writeable"],
) -> typing.Annotated[numpy.typing.NDArray[numpy.float64], "[m, 1]"]: ...


# Call f1 (no errors):
f1(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))


# Current state
#
# Errors:
# PylancereportInvalidTypeArguments:
#   Type "float64" cannot be assigned to type variable "_ShapeT_co@ndarray"
#   Type "float64" is not assignable to upper bound "_Shape" for type variable "_ShapeT_co@ndarray"
#   "float64" is not assignable to "tuple[int, ...]"
# Ruff(F821):
#   Undefined name `m`
#   Undefined name `flags`
# PylancereportUndefinedVariable:
#   "m" is not defined
#   "flags" is not defined
# PylancereportInvalidTypeArguments:
#   Expected no type arguments for class "float64"
# mypy[type-arg]: "float64" expects no type arguments, but 2 given
# mypy[name-defined]: Name "flags" is not defined
# mypy[type-var]: Type argument "float64" of "ndarray" must be a subtype of "tuple[int, ...]"
# mypy[name-defined]: Name "m" is not defined  [name-defined]
# mypy[valid-type]: Invalid type: try using Literal[1] instead?
def f2(
    x: numpy.ndarray[numpy.float64[m, 1], flags.writeable],
) -> numpy.ndarray[numpy.float64[m, 1], flags.writeable]: ...


# Suggestion by henryiii
#
# Errors:
# PylancereportInvalidTypeArguments:
#   Type "float64" cannot be assigned to type variable "_DType_co@ndarray"
#   Type "float64" is not assignable to upper bound "dtype[Any]" for type variable "_DType_co@ndarray"
#   "float64" is not assignable to "dtype[Any]"
# mypy[type-var]: Type argument "float64" of "ndarray" must be a subtype of "dtype[Any]"
def f3(x: numpy.ndarray[Any, numpy.float64]) -> numpy.ndarray[Any, numpy.float64]: ...


# Fix of f3 by gentlegiantJGC (No errors, but no shape info):
def f4(
    x: numpy.ndarray[Any, numpy.dtype[numpy.float64]],
) -> numpy.ndarray[Any, numpy.dtype[numpy.float64]]: ...


# Call f4 (no errors):
f4(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))


# f4 with wrong shape:
#
# Errors:
# PylancereportGeneralTypeIssues:
#   Expected class but received "tuple[Literal[3], Literal[3]]"
# mypy[syntax]:
#   error: Syntax error in type annotation
#   note: Suggestion: Use Tuple[T1, ..., Tn] instead of (T1, ..., Tn)
def f5(
    x: numpy.ndarray[(3, 3), numpy.dtype[numpy.float64]],
) -> numpy.ndarray[(3, 3), numpy.dtype[numpy.float64]]: ...


# f4 with wrong shape:
#
# Errors:
# PylancereportGeneralTypeIssues:
#   Expected class but received "Literal[3]"
# mypy[valid-type]: Invalid type: try using Literal[3] instead?
def f6(
    x: numpy.ndarray[tuple[3, 3], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[3, 3], numpy.dtype[numpy.float64]]: ...


# f4 with correct shape (No errors, but only int possible, not `m`):
def f7(
    x: numpy.ndarray[tuple[Literal[3], Literal[3]], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[Literal[3], Literal[3]], numpy.dtype[numpy.float64]]: ...


# Call f7:
#
# Errors:
# PylancereportArgumentType:
#   Argument of type "NDArray[Any]" cannot be assigned to parameter "x" of type "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]" in function "f7"
#   "ndarray[_Shape, dtype[Any]]" is not assignable to "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]"
#   Type parameter "_ShapeT_co@ndarray" is covariant, but "_Shape" is not a subtype of "tuple[Literal[3], Literal[3]]"
#   "tuple[int, ...]" is not assignable to "tuple[Literal[3], Literal[3]]"
#   Tuple size mismatch; expected 2 but received indeterminate
# mypy[arg-type]: Argument 1 to "f7" has incompatible type "ndarray[tuple[int, ...], dtype[float64]]"; expected "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]"
f7(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))

# f4 with correct shape + m as TypeVar (No errors):
# Two int in TypeVar, otherwide PylancereportGeneralTypeIssues: TypeVar must have at least two constrained types
m = TypeVar("m", int, int)
def f8(
    x: numpy.ndarray[tuple[m, Literal[3]], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[m, Literal[3]], numpy.dtype[numpy.float64]]: ...


# Call f8:
#
# Errors:
# PylancereportArgumentType:
#   Argument of type "NDArray[Any]" cannot be assigned to parameter "x" of type "ndarray[tuple[m@f8, Literal[3]], dtype[float64]]" in function "f8"
#   "ndarray[_Shape, dtype[Any]]" is not assignable to "ndarray[tuple[m@f8, Literal[3]], dtype[float64]]"
#   Type parameter "_ShapeT_co@ndarray" is covariant, but "_Shape" is not a subtype of "tuple[m@f8, Literal[3]]"
#   "tuple[int, ...]" is not assignable to "tuple[m@f8, Literal[3]]"
#   Tuple size mismatch; expected 2 but received indeterminate
# mypy[arg-type]: Argument 1 to "f8" has incompatible type "ndarray[tuple[int, ...], dtype[float64]]"; expected "ndarray[tuple[int, Literal[3]], dtype[float64]]"
f8(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))

@InvincibleRMC
Copy link
Contributor Author

One small style question @InvincibleRMC: Any reason why you are using escaped quotation marks (\" instead of plain ") in multi-line strings? To my understanding, there is no need to escape, but maybe it is preferable due to another reason?

I will be honest did not know that is how it worked in multi-line strings. will change to the plain " since that seems cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants