Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Sep 12, 2025

Closes #1928

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

If you have comments or can explain your changes, please do so below

Note

The positive diff is from new quite exhaustive tests
narwhals.translate has shrunk a lot πŸ₯³

Show tasks

Previews

I've collected a bunch of things here that may be missed from the diff alone.
Be sure to check out the <details> πŸ˜‰

fix(typing): More gracefully handle narwhals in overloads

Any was previously being used on the first few overloads to solve a recursion issue.

But that had the downside of swallowing completions until you matched an overload with native_object:

Show before

image

But now, we have something more useful

Show after

image

Changes to Native{Series,DataFrame}

As mentioned in (#3125 (comment)), these protocols needed to be
adapted to make them disjoint across pandas, polars, pyarrow:

Class drop filter unique value_counts
pd.Series βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
pl.Series ❌ βœ”οΈ βœ”οΈ βœ”οΈ
pa.ChunkedArray ❌ βœ”οΈ βœ”οΈ βœ”οΈ
pd.DataFrame βœ”οΈ βœ”οΈ ❌ βœ”οΈ
pl.DataFrame βœ”οΈ βœ”οΈ βœ”οΈ ❌
pa.Table βœ”οΈ βœ”οΈ ❌ ❌

MkDocs Comparison

These changes shrink the API docs for from_native quite a lot.

However, if the specific arguments combinations are wanted here - then an options is adding the TypedDicts somewhere and enabling signature_crossrefs.
IIRC, that feature used to be insiders-only but looks to be available since https://squidfunk.github.io/mkdocs-material/blog/2025/11/11/insiders-now-free-for-everyone/

Show main

image image image

Show (636f787)

image image

Note

I'm fine with less detail here, but signature_crossrefs is an option if this is blocking πŸ™‚

Comment on lines 38 to 50
class NativeDataFrame(Sized, NativeFrame, Protocol):
def drop(self, *args: Any, **kwargs: Any) -> Any: ...

class NativeLazyFrame(NativeFrame, Protocol):
def explain(self, *args: Any, **kwargs: Any) -> Any: ...

# Needs to have something `NativeDataFrame` doesn't?
class NativeSeries(Sized, Iterable[Any], Protocol):
def filter(self, *args: Any, **kwargs: Any) -> Any: ...
# `pd.DataFrame` has this - the others don't
def value_counts(self, *args: Any, **kwargs: Any) -> Any: ...
# `pl.DataFrame` has this - the others don't
def unique(self, *args: Any, **kwargs: Any) -> Any: ...
Copy link
Member Author

@dangotbanned dangotbanned Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 1

NativeDataFrame and NativeSeries could overlap.

Although we don't define NativeDataFrame.filter - the classes that match it do:

That was leading to the strange @overload order and parameters changing the return type.

But now if we started with NativeSeries, then both series_only and allow_series will preserve that detail:

Prove it bro

def test_from_native_series_exhaustive() -> None: # noqa: PLR0914, PLR0915
pytest.importorskip("polars")
pytest.importorskip("pandas")
pytest.importorskip("pyarrow")
pytest.importorskip("typing_extensions")
import pandas as pd
import polars as pl
import pyarrow as pa
from typing_extensions import assert_type
pl_ser = pl.Series([1, 2, 3])
pd_ser = cast("pd.Series[Any]", pd.Series([1, 2, 3]))
pa_ser = cast("pa.ChunkedArray[Any]", pa.chunked_array([pa.array([1])])) # type: ignore[redundant-cast]
pl_1 = nw.from_native(pl_ser, series_only=True)
pl_2 = nw.from_native(pl_ser, allow_series=True)
pl_3 = nw.from_native(pl_ser, eager_only=True, series_only=True)
pl_4 = nw.from_native(pl_ser, eager_only=True, series_only=True, allow_series=True)
pl_5 = nw.from_native(pl_ser, eager_only=True, allow_series=True)
pl_6 = nw.from_native(pl_ser, series_only=True, allow_series=True)
pl_7 = nw.from_native(pl_ser, series_only=True, pass_through=True)
pl_8 = nw.from_native(pl_ser, allow_series=True, pass_through=True)
pl_9 = nw.from_native(pl_ser, eager_only=True, series_only=True, pass_through=True)
pl_10 = nw.from_native(
pl_ser, eager_only=True, series_only=True, allow_series=True, pass_through=True
)
pl_11 = nw.from_native(pl_ser, eager_only=True, allow_series=True, pass_through=True)
pl_12 = nw.from_native(pl_ser, series_only=True, allow_series=True, pass_through=True)
pls = pl_1, pl_2, pl_3, pl_4, pl_5, pl_6, pl_7, pl_8, pl_9, pl_10, pl_11, pl_12
assert_type(pl_1, nw.Series[pl.Series])
assert_type(pl_2, nw.Series[pl.Series])
assert_type(pl_3, nw.Series[pl.Series])
assert_type(pl_4, nw.Series[pl.Series])
assert_type(pl_5, nw.Series[pl.Series])
assert_type(pl_6, nw.Series[pl.Series])
assert_type(pl_7, nw.Series[pl.Series])
assert_type(pl_8, nw.Series[pl.Series])
assert_type(pl_9, nw.Series[pl.Series])
assert_type(pl_10, nw.Series[pl.Series])
assert_type(pl_11, nw.Series[pl.Series])
assert_type(pl_12, nw.Series[pl.Series])
pd_1 = nw.from_native(pd_ser, series_only=True)
pd_2 = nw.from_native(pd_ser, allow_series=True)
pd_3 = nw.from_native(pd_ser, eager_only=True, series_only=True)
pd_4 = nw.from_native(pd_ser, eager_only=True, series_only=True, allow_series=True)
pd_5 = nw.from_native(pd_ser, eager_only=True, allow_series=True)
pd_6 = nw.from_native(pd_ser, series_only=True, allow_series=True)
pd_7 = nw.from_native(pd_ser, series_only=True, pass_through=True)
pd_8 = nw.from_native(pd_ser, allow_series=True, pass_through=True)
pd_9 = nw.from_native(pd_ser, eager_only=True, series_only=True, pass_through=True)
pd_10 = nw.from_native(
pd_ser, eager_only=True, series_only=True, allow_series=True, pass_through=True
)
pd_11 = nw.from_native(pd_ser, eager_only=True, allow_series=True, pass_through=True)
pd_12 = nw.from_native(pd_ser, series_only=True, allow_series=True, pass_through=True)
pds = pd_1, pd_2, pd_3, pd_4, pd_5, pd_6, pd_7, pd_8, pd_9, pd_10, pd_11, pd_12
assert_type(pd_1, nw.Series["pd.Series[Any]"])
assert_type(pd_2, nw.Series["pd.Series[Any]"])
assert_type(pd_3, nw.Series["pd.Series[Any]"])
assert_type(pd_4, nw.Series["pd.Series[Any]"])
assert_type(pd_5, nw.Series["pd.Series[Any]"])
assert_type(pd_6, nw.Series["pd.Series[Any]"])
assert_type(pd_7, nw.Series["pd.Series[Any]"])
assert_type(pd_8, nw.Series["pd.Series[Any]"])
assert_type(pd_9, nw.Series["pd.Series[Any]"])
assert_type(pd_10, nw.Series["pd.Series[Any]"])
assert_type(pd_11, nw.Series["pd.Series[Any]"])
assert_type(pd_12, nw.Series["pd.Series[Any]"])
pa_1 = nw.from_native(pa_ser, series_only=True)
pa_2 = nw.from_native(pa_ser, allow_series=True)
pa_3 = nw.from_native(pa_ser, eager_only=True, series_only=True)
pa_4 = nw.from_native(pa_ser, eager_only=True, series_only=True, allow_series=True)
pa_5 = nw.from_native(pa_ser, eager_only=True, allow_series=True)
pa_6 = nw.from_native(pa_ser, series_only=True, allow_series=True)
pa_7 = nw.from_native(pa_ser, series_only=True, pass_through=True)
pa_8 = nw.from_native(pa_ser, allow_series=True, pass_through=True)
pa_9 = nw.from_native(pa_ser, eager_only=True, series_only=True, pass_through=True)
pa_10 = nw.from_native(
pa_ser, eager_only=True, series_only=True, allow_series=True, pass_through=True
)
pa_11 = nw.from_native(pa_ser, eager_only=True, allow_series=True, pass_through=True)
pa_12 = nw.from_native(pa_ser, series_only=True, allow_series=True, pass_through=True)
pas = pa_1, pa_2, pa_3, pa_4, pa_5, pa_6, pa_7, pa_8, pa_9, pa_10, pa_11, pa_12
assert_type(pa_1, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_2, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_3, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_4, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_5, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_6, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_7, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_8, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_9, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_10, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_11, nw.Series["pa.ChunkedArray[Any]"])
assert_type(pa_12, nw.Series["pa.ChunkedArray[Any]"])
for series in chain(pls, pds, pas):
assert isinstance(series, nw.Series)

Comment on lines 125 to 129
class OnlyEager(TypedDict, total=False):
pass_through: bool
eager_only: Required[Literal[True]]
series_only: bool
allow_series: bool | None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 2

There's too many combinations we need to account for in @overloads.

  • 3x different, true native_objects
  • 3x narwhals objects pretending to be native
  • 4x flags
    • They all have a default, so needs ...
    • allow_series: bool | None just adds to it

So instead of that:

  • I've narrowed it down to 1-2 keywords that define unique behavior
  • Given that a name
  • Made a TypedDict
  • Then relied on Unpack and Required to merge a whole bunch of them

@dangotbanned dangotbanned changed the title fix(typing): Completely overhaul from_native fix(typing): Overhaul @overloads in nw.from_native Sep 14, 2025
@dangotbanned
Copy link
Member Author

@MarcoGorelli this one isn't ready for a proper review yet, but I was curious about your thoughts on the idea generally?

I did some testing locally and I think depending on how these issues have progressed - docs might be one more area for experimenting

@MarcoGorelli
Copy link
Member

3x different, true native_objects

tbh I'm still not too keen on matching on the native object, i kind of feel like nw.from_native(obj: T) should return DataFrame[T] | LazyFrame[T], regardless of T

@dangotbanned
Copy link
Member Author

3x different, true native_objects

tbh I'm still not too keen on matching on the native object, i kind of feel like nw.from_native(obj: T) should return DataFrame[T] | LazyFrame[T], regardless of T

That isn't possible due to the issue from:

Which is also why Any will match the first @overload, which means (obj: Any) -> Any

Possibly these sections of the spec may be helpful?


These are the issues that come from having such a heavily polymorphic function I'm afraid 😒
I'm hoping to make the best of a tricky situation and move the overloads into something that is (IMO) more maintainable

`v1`, `v2` can extend and reuse these defs
Tests are in `TYPE_CHECKING`, due to runtime not matching (yet) see #3226
Reverts [fix(typing): More gracefully handle narwhals in overloads](0605936)
Merged 6 overloads -> 2!
return super().__getitem__(item)

def get_column(self, name: str) -> Series:
def get_column(self, name: str) -> Series[Any]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1.Series only had a default TypeVar to avoid downstream noise from making it generic

That shouldn't have carried over into v2

Comment on lines +511 to +520
def test_from_native_lazyframe_exhaustive() -> None: # noqa: PLR0914, PLR0915
pytest.importorskip("polars")
pytest.importorskip("typing_extensions")

import polars as pl
from typing_extensions import assert_type

pl_ldf = pl.LazyFrame(data)

pl_1 = nw.from_native(pl_ldf)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to get ahead of the obvious question:

Do we need all of this?

The answer is complicated.

IMO, the real source of all the typing issues with from_native has been that we test very little in terms of typing for such a complex function.
I think that the primary entry point for narwhals should require much more care to be sure we don't regress.

That's my opinion, but I wouldn't feel so confident in making such big changes to the typing without this in place.

I'm guessing Marco is familiar with this from numpy?

Doing this in .pyi files rather than (mostly) in a TYPE_CHECKING block is another option πŸ™‚

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue won't be resolved by this PR, but it does highlight the same gap

We have no typing coverage for the example on the README.md?

import narwhals as nw
from narwhals.typing import IntoFrameT


def agnostic_function(
    df_native: IntoFrameT,
    date_column: str,
    price_column: str,
) -> IntoFrameT:
    return (
        nw.from_native(df_native)
        .group_by(nw.col(date_column).dt.truncate("1mo"))
        .agg(nw.col(price_column).mean())
        .sort(date_column)
        .to_native()
    )

mypy complains about this already 😳

narwhals\/.py:15: error: Value of type variable "IntoLazyFrameT" of "from_native" cannot be "IntoFrameT"  [type-var]
            nw.from_native(df_native)
            ^~~~~~~~~~~~~~~~~~~~~~~~~
Found 1 error in 1 file (checked 1 source file)

@dangotbanned dangotbanned marked this pull request as ready for review November 21, 2025 12:00
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the testing is exhaustive

thanks @dangotbanned !

@dangotbanned
Copy link
Member Author

Thanks @MarcoGorelli

looks like the testing is exhaustive

I'm exhausted just looking at them

@MarcoGorelli
Copy link
Member

aw i'm sorry for this api 😒

@dangotbanned
Copy link
Member Author

aw i'm sorry for this api 😒

Don't be silly, I'm dunking on myself here πŸ˜„

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

missing overload for nw.from_native(obj, eager_only=True, series_only=True)

3 participants