Skip to content

GH1180 Clean up all/any methods for Series and DataFrame #1188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

loicdiridollou
Copy link
Contributor

Technically we should fix the pd.DataFrame.all but that would mean having a pd.Series[np.bool], yet np.bool is not a type of S1 so will leave it there for a bit, it has some FIXME statements there for us to know.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove the FIXME comments. Otherwise OK

@@ -1660,7 +1660,8 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack):
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
# FIXME the type below is not correct, should be pd.Series[np.bool]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I look at it is that we are using Series[bool] to correspond to whatever bool is stored inside - being a python one, a numpy one, or even if we use BooleanDtype, so I don't think this comment is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the type checker will not accept that, because pd.Series can only be subscribed with S1 which contains bool (the generic boolean from python) but not np.bool.
Yet pd.DataFrame.any will return at runtime pd.Series[np.bool] (but this is not accepted since np.bool is not a subtype of S1).
Let me know if this is not clear,
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type returned from DataFrame.any() and DataFrame.all() should be Series[_bool] . Then the tests should use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue is that _bool does not contain np.bool which is the type we get at runtime. Happy to keep the stubs as is but that would mean that runtime type does not align with static type.
Or we can add np.bool to _bool, open to suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. We have checks like this:

check(assert_type(df.any(), "pd.Series[bool]"), pd.Series, np.bool_)

So even though np.bool_ is in the Series, we call the type of the Series[bool]

It's similar to this:

    check(assert_type(df.value_counts(), "pd.Series[int]"), pd.Series, np.integer)

What's inside the series are numpy integers, but we call that Series[int]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see your point, let me fix the comments, thanks for the more detailed vision on this issue!

@loicdiridollou loicdiridollou requested a review from Dr-Irv April 17, 2025 20:54
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv Dr-Irv merged commit 1793f88 into pandas-dev:main Apr 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong type hint for Series.all() and Series.any()
2 participants