Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated STL type hints to use collections.abc #5566

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

timohl
Copy link
Contributor

@timohl timohl commented Mar 16, 2025

Description

This updates type hints for the set, map, list and array STL casters to use the more generic collections.abc types in convertible arguments.

caster convert-arg return/noconvert-arg
set_caster collections.abc.Set set
map_caster collections.abc.Mapping dict
list_caster collections.abc.Sequence list
array_caster collections.abc.Sequence list

For map_caster this is exactly how the caster works.
Unfortunately, list_caster, set_caster and array_caster work a bit different regarding noconvert:
For args list_caster and array_caster always allow sequence and set_caster always allows anyset, but all three allow iterable in convert mode.

The current system only differs between input and output type (io_name) and falls back to the output type in noconvert args.
These casters' behavior would require three different type hints: arg-convert, arg-noconvert, return.
Therefore, I currently see no way to improve these type hints further without deeper changes.
So for now, I think this should be a good compromise for most use cases.

Additionally, the array_caster was updated to match the typing.Annotated style of numpy/eigen type hints.

Suggested changelog entry:

Updated STL type hints to use collections.abc

@timohl
Copy link
Contributor Author

timohl commented Mar 17, 2025

The failing check is unrelated, I think (maybe rerun is enough).

@timohl
Copy link
Contributor Author

timohl commented Mar 17, 2025

After seeing #5498 and digging deeper into the caster code, I noticed that I have to think more about this.

These three functions restrict the casters further than I thought:

inline bool PyObjectTypeIsConvertibleToStdVector(PyObject *obj) {
if (PySequence_Check(obj) != 0) {
return !PyUnicode_Check(obj) && !PyBytes_Check(obj);
}
return (PyGen_Check(obj) != 0) || (PyAnySet_Check(obj) != 0)
|| PyObjectIsInstanceWithOneOfTpNames(
obj, {"dict_keys", "dict_values", "dict_items", "map", "zip"});
}
inline bool PyObjectTypeIsConvertibleToStdSet(PyObject *obj) {
return (PyAnySet_Check(obj) != 0) || PyObjectIsInstanceWithOneOfTpNames(obj, {"dict_keys"});
}
inline bool PyObjectTypeIsConvertibleToStdMap(PyObject *obj) {
if (PyDict_Check(obj)) {
return true;
}
// Implicit requirement in the conditions below:
// A type with `.__getitem__()` & `.items()` methods must implement these
// to be compatible with https://docs.python.org/3/c-api/mapping.html
if (PyMapping_Check(obj) == 0) {
return false;
}
PyObject *items = PyObject_GetAttrString(obj, "items");
if (items == nullptr) {
PyErr_Clear();
return false;
}
bool is_convertible = (PyCallable_Check(items) != 0);
Py_DECREF(items);
return is_convertible;
}

For example, it requires the mapping to be of type or subtype of set or frozenset or have dict_keys if I understand correctly.
The caster itself uses the Mapping protocol though and could easily be changed to fully allow it.
I will add some tests to better map out what is allowed and what not and how this relates to the type hints.

Git blame directed me to #4686, which seems to have more insight.
@rwgk if you remember this PR, I would love to hear your view.
If not, I will dig into this PR on the weekend and summarize my findings here.

@rwgk
Copy link
Collaborator

rwgk commented Mar 18, 2025

@timohl Did you see these already?

These three functions restrict the casters further than I thought:

I'm fine if you want to work on those functions. I'm thinking it's best to keep the current logic, which is super fast, but where we're currently returning false, add additional sophisticated conditions as needed.

@timohl timohl marked this pull request as draft March 18, 2025 15:25
Comment on lines 176 to 183
if (!convert) {
return false;
}
if (!(isinstance(src, module_::import("collections.abc").attr("Set"))
&& hasattr(src, "__contains__") && hasattr(src, "__iter__")
&& hasattr(src, "__len__"))) {
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider moving this code into PyObjectTypeIsConvertibleToStdSet()?

Then it would be easier to use same logic in other custom type casters.

Giving PyObjectTypeIsConvertibleToStdSet() the additional bool convert argument seems fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I integrated the checks, but had to change the argument from PyObject* to const handle & in order to use isinstance and hasattr.
Is that ok, or should I use PyObject based functions from the Python C API directly?

Honestly, I do not like how obscure those functions currently are due to all those checks.
While there is a comment explaining some intentions, I think it could be made clearer what should pass and what not.
I will add and improve comments later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that ok

Yes. — The functions came from PyCLIF, and I wanted to keep them compatible, but that's not a concern anymore.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only glanced through very quickly. Is this still in draft mode intentionally?

inline bool PyObjectTypeIsConvertibleToStdVector(PyObject *obj) {
if (PySequence_Check(obj) != 0) {
return !PyUnicode_Check(obj) && !PyBytes_Check(obj);
inline bool HandleIsConvertibleToStdVector(const handle &src) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about these names?

object_is_convertible_to_std_vector
object_is_convertible_to_std_set
object_is_convertible_to_std_map

(handle happens to be the argument type here, but that's really secondary, or even subject to change.)

@timohl
Copy link
Contributor Author

timohl commented Mar 25, 2025

I only glanced through very quickly. Is this still in draft mode intentionally?

I would like to improve the comments before finalizing and being ready to merge.
Unfortunately, I was pretty busy the last couple of days and could not find enough time.
I can probably get back to it tomorrow.

Also, your comment about the function names sounds good. I will change that.

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2025

No rush, at all, from my end. I just wanted to be sure you're not waiting for my feedback.

@InvincibleRMC
Copy link
Contributor

Would it also be possible to update Buffer type to collections.abc.Buffer? More info here.

@timohl
Copy link
Contributor Author

timohl commented Mar 26, 2025

Would it also be possible to update Buffer type to collections.abc.Buffer? More info here.

Going through the code here:

template <>
struct handle_type_name<object> {
static constexpr auto name = const_name("object");
};
template <>
struct handle_type_name<list> {
static constexpr auto name = const_name("list");
};
template <>
struct handle_type_name<dict> {
static constexpr auto name = const_name("dict");
};
template <>
struct handle_type_name<anyset> {
static constexpr auto name = const_name("Union[set, frozenset]");
};
template <>
struct handle_type_name<set> {
static constexpr auto name = const_name("set");
};
template <>
struct handle_type_name<frozenset> {
static constexpr auto name = const_name("frozenset");
};
template <>
struct handle_type_name<str> {
static constexpr auto name = const_name("str");
};
template <>
struct handle_type_name<tuple> {
static constexpr auto name = const_name("tuple");
};
template <>
struct handle_type_name<bool_> {
static constexpr auto name = const_name("bool");
};
template <>
struct handle_type_name<bytes> {
static constexpr auto name = const_name(PYBIND11_BYTES_NAME);
};
template <>
struct handle_type_name<buffer> {
static constexpr auto name = const_name("Buffer");
};
template <>
struct handle_type_name<int_> {
static constexpr auto name = io_name("typing.SupportsInt", "int");
};
template <>
struct handle_type_name<iterable> {
static constexpr auto name = const_name("Iterable");
};
template <>
struct handle_type_name<iterator> {
static constexpr auto name = const_name("Iterator");
};
template <>
struct handle_type_name<float_> {
static constexpr auto name = io_name("typing.SupportsFloat", "float");
};
template <>
struct handle_type_name<function> {
static constexpr auto name = const_name("Callable");
};
template <>
struct handle_type_name<handle> {
static constexpr auto name = handle_type_name<object>::name;
};
template <>
struct handle_type_name<none> {
static constexpr auto name = const_name("None");
};
template <>
struct handle_type_name<sequence> {
static constexpr auto name = const_name("Sequence");
};
template <>
struct handle_type_name<bytearray> {
static constexpr auto name = const_name("bytearray");
};
template <>
struct handle_type_name<memoryview> {
static constexpr auto name = const_name("memoryview");
};
template <>
struct handle_type_name<slice> {
static constexpr auto name = const_name("slice");
};
template <>
struct handle_type_name<type> {
static constexpr auto name = const_name("type");
};
template <>
struct handle_type_name<capsule> {
static constexpr auto name = const_name("types.CapsuleType");
};
template <>
struct handle_type_name<ellipsis> {
static constexpr auto name = const_name("ellipsis");
};
template <>
struct handle_type_name<weakref> {
static constexpr auto name = const_name("weakref");
};

There are a bunch of other types that could be changed:

  • Union[set, frozenset] -> typing.Union[set, frozenset] or maybe better set | frozenset
  • Buffer -> collections.abc.Buffer
  • Iterable -> collections.abc.Iterable
  • Iterator -> collections.abc.Iterator
  • Callable -> collections.abc.Callable
  • Sequence -> collections.abc.Sequence
  • ellipsis -> types.EllipsisType
    @InvincibleRMC Would you agree with those? Am I missing some?

@InvincibleRMC
Copy link
Contributor

InvincibleRMC commented Mar 26, 2025

Currently stub generators typically know that Iterable and the other types are available in the typing module. However, this doesn't apply to Buffer since it does not exist in the typing module. If we determine it is better to make all the types explicit (in the form of foo.bar.Baz) we should also update all the types found in typing.h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants