Skip to content

tie IntoFrameT to NativeFrame#3496

Open
MarcoGorelli wants to merge 27 commits intonarwhals-dev:mainfrom
MarcoGorelli:another-typing-readme
Open

tie IntoFrameT to NativeFrame#3496
MarcoGorelli wants to merge 27 commits intonarwhals-dev:mainfrom
MarcoGorelli:another-typing-readme

Conversation

@MarcoGorelli
Copy link
Copy Markdown
Member

🤔 might this be a solution to #3493 ?

Description

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 4, 2026 08:07
dangotbanned

This comment was marked as resolved.

@MarcoGorelli
Copy link
Copy Markdown
Member Author

This one demonstrates the overlap issue 👎

yup, thanks for taking a look!

@MarcoGorelli MarcoGorelli marked this pull request as draft March 10, 2026 18:00
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 27, 2026 15:08
@MarcoGorelli MarcoGorelli marked this pull request as draft March 27, 2026 15:09
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 27, 2026 16:09
@MarcoGorelli
Copy link
Copy Markdown
Member Author

This one demonstrates the overlap issue 👎

Thanks! I think this is addressible with some extra specialised overloads, have pushed

What happens when we use incompatible parts of the *Frame APIs?:

Aren't these generally tested already? Any particular tests you'd like to see?

@akmalsoliev
Copy link
Copy Markdown
Contributor

Hey folks, any update on this one? Drove me mad for 30 minutes till I realized it is pyrefly specific error

@MarcoGorelli
Copy link
Copy Markdown
Member Author

MarcoGorelli commented Apr 3, 2026

I don't think it's pyrefly-specific, pyright flags it too

to respond to Dan's comment point by point:

  • collect(backend)

If you do collect, then you can't hope to have a IntoFrameT -> IntoFrameT function anyway

  • lazy(backend)

same

  • with_row_index()

This isn't an incompatible API, it returns Self, it's the same as using select or filter

  • join(other=<same|wrong-type|the-result-of-a-union>)

Yup, this one's problematic (as it currently is on main). It's not made worse as far as I can tell

Something I wouldn't be opposed to would be to change the LazyFrame.lazy API to be

def lazy(self) -> LazyFrame[Any]:

as, realistically, if you're calling .lazy on a lazyframe, it's because you don't know if it's lazy or eager

  • unique(maintain_order=True)

Also not an incompatible API, it returns Self

  • sink_parquet(...)

This returns None, so I'm not sure what there is to test here

@dangotbanned

This comment was marked as resolved.

@dangotbanned

This comment was marked as resolved.

@MarcoGorelli
Copy link
Copy Markdown
Member Author

sure, thanks! have added some tests for these functions, looks expected to me

Copy link
Copy Markdown
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Marco, I'm still feeling a bit skeptical about this.
I'm happier than I was following (#3496 (review)), and (#3496 (comment)) - so I wanted to highlight that ❤️

I would feel more confident if the PR didn't introduce any of:

  • Any
  • new type ignores

It's okay to disagree with a type checker (I often do), but I think documenting the issue - and why it is okay to ignore - is very helpful for future readers that may solve it

count narwhal

Comment thread narwhals/translate.py
Comment on lines +131 to +134
def from_native(
native_object: LazyFrameT | DataFrameT, **kwds: Unpack[AllowLazy]
) -> LazyFrameT | DataFrameT: ...
@overload
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is now needed, does it mean that this is now a valid annotation?

nw.DataFrame[nw.DataFrame[Any] | nw.LazyFrame[Any]]

Suggested change
def from_native(
native_object: LazyFrameT | DataFrameT, **kwds: Unpack[AllowLazy]
) -> LazyFrameT | DataFrameT: ...
@overload

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without it, we'd get that r2_either in #3496 (review) is

  /home/marcogorelli/polars-api-compat-dev/tests/translate/from_native_test.py:818:21 - error: "assert_type" mismatch: expected "narwhals.dataframe.DataFrame[polars.dataframe.frame.DataFrame] | narwhals.dataframe.LazyFrame[polars.lazyframe.frame.LazyFrame]" but received "narwhals.dataframe.DataFrame[narwhals.dataframe.DataFrame[polars.dataframe.frame.DataFrame] | narwhals.dataframe.LazyFrame[polars.lazyframe.frame.LazyFrame]] | narwhals.dataframe.LazyFrame[narwhals.dataframe.DataFrame[polars.dataframe.frame.DataFrame] | narwhals.dataframe.LazyFrame[polars.lazyframe.frame.LazyFrame]]" (reportAssertTypeFailure)

Comment thread narwhals/translate.py
native_object: IntoLazyFrameT, **kwds: Unpack[AllowLazy]
) -> LazyFrame[IntoLazyFrameT]: ...
@overload
def from_native( # type: ignore[overload-overlap]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not feeling convinced by this or (#3496 (comment))
AFAICT, both are the same issue reported by two different type checkers.

If you remove all the ignore comments, does pyrefly not report these as an issue as well?
For ty, we'd need (astral-sh/ty#103) to get it reported - so probably nothing there yet

Comment thread tests/translate/from_native_test.py Outdated
Comment thread tests/translate/from_native_test.py Outdated
Comment thread tests/translate/from_native_test.py Outdated
Comment thread narwhals/translate.py Outdated
allow_series: bool | None,
) -> Any: ...
def from_native(
def from_native( # pyright: ignore[reportInconsistentOverload]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this part isn't user-facing (the overloads are used) i didn't it was too big of a deal

How each type checker deals will overlapping overloads is a bit hairy.
The typing spec says this about solving overloads at the callsite

If the return types are not equivalent, overload matching is ambiguous.
In this case, assume a return type of Any and stop.

AFAICT, there isn't a spec for overlapping definitions 😢
So the conformance tests focus on just the usage part.

This error

Both mypy and pyright are trying to tell you that:

As a parameter, IntoFrameT matches:

IntoDataFrameT | IntoLazyFrameT
IntoFrameT

But the return types do not match:

DataFrame[IntoDataFrameT] | LazyFrame[IntoLazyFrameT]
DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]

  • I think mypy does the Any thing?
  • I'm more sure that pyright would synthesize a signature with a * in it for inference
  • (I'd assume) ty uses intersections

Important

The user facing problems I've experienced is that:

  1. it makes type checkers run much slower, since they need to track 2 or more overloads that matched
  2. this can make a language server produce flaky errors, since each run might take a different route

I left a screen capture in (zen-xu/pyarrow-stubs#208 (comment)) with an extreme example or how slow it can get 😓

Comment thread narwhals/dataframe.py Outdated
@MarcoGorelli
Copy link
Copy Markdown
Member Author

thanks @dangotbanned , have addressed most comments

Regarding the overlap, don't type checkers usually match the first valid overload?

At least, for

from typing import overload, Sequence, reveal_type

@overload
def foo(a: str) -> int: ...  # type: ignore[overload-overlap]

@overload
def foo(a: Sequence[str]) -> str: ...

def foo(a: str | Sequence[str]) -> int | str:
    if isinstance(a, str):
        return 0
    return 'zero'

reveal_type(foo('abc'))
reveal_type(foo(['abc']))

All of mypy, pyright, pyrefly, ty, and zuban, do exactly that. pandas and numpy both rely on this in their stubs, so I think this is unlikely to change?

(scratch) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ mypy t.py
t.py:14: note: Revealed type is "int"
t.py:15: note: Revealed type is "str"
Success: no issues found in 1 source file
(scratch) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ pyright t.py
/home/marcogorelli/scratch/t.py
  /home/marcogorelli/scratch/t.py:14:13 - information: Type of "foo('abc')" is "int"
  /home/marcogorelli/scratch/t.py:15:13 - information: Type of "foo(['abc'])" is "str"
0 errors, 0 warnings, 2 informations
(scratch) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ pyrefly check t.py
 INFO revealed type: int [reveal-type]
  --> t.py:14:12
   |
14 | reveal_type(foo('abc'))
   |            ------------
   |
 INFO revealed type: str [reveal-type]
  --> t.py:15:12
   |
15 | reveal_type(foo(['abc']))
   |            --------------
   |
 INFO 0 errors
(scratch) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ ty check t.py
info[revealed-type]: Revealed type
  --> t.py:14:13
   |
12 |     return 'zero'
13 |
14 | reveal_type(foo('abc'))
   |             ^^^^^^^^^^ `int`
15 | reveal_type(foo(['abc']))
   |

info[revealed-type]: Revealed type
  --> t.py:15:13
   |
14 | reveal_type(foo('abc'))
15 | reveal_type(foo(['abc']))
   |             ^^^^^^^^^^^^ `str`
   |

Found 2 diagnostics
(scratch) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ zuban check t.py
t.py:14: note: Revealed type is "builtins.int"
t.py:15: note: Revealed type is "builtins.str"
Success: no issues found in 1 source file

@MarcoGorelli
Copy link
Copy Markdown
Member Author

just checking if you still disagree with the approach of whether there can be a way forwards here?

@dangotbanned
Copy link
Copy Markdown
Member

Hey Marco, I haven't got back to this yet - so this isn't a comment on the most recent changes - but appreciate you taking things on board 😊

just checking if you still disagree with the approach

In my last review I didn't mean to give off that impression, sorry for that 😔
It was more that I felt there were some unknowns and worried that not having the full picture now would mean we'd here again soon, fixing more from_native edge cases again.

That isn't to say my worries are correct, and I would much rather your changes turn out to be fine 🤞

I'll do my best to go through your changes this weekend, if that's okay?


One thing for now, re overloads (#3496 (comment))

I was thinking about this PR a couple of days ago when I saw this potential spec change:

I think the 2nd spec change there is related - might be worth reading to see if (#3496 (comment)) would be handled differently?

@MarcoGorelli
Copy link
Copy Markdown
Member Author

thanks! yeah no hurry at all, just checking if there was a potential way forwards or if you were firmly against it

btw, i think we're already relying on inconsistent overlapping overloads here

narwhals/narwhals/dataframe.py

Lines 1049 to 1057 in 25bdb8f

# `str` overlaps with `Sequence[str]`
# We can ignore this but we must keep this overload ordering
@overload
def __getitem__(self, item: tuple[SingleIndexSelector, SingleColSelector]) -> Any: ...
@overload
def __getitem__( # type: ignore[overload-overlap]
self, item: str | tuple[MultiIndexSelector, SingleColSelector]
) -> Series[Any]: ...

@dangotbanned
Copy link
Copy Markdown
Member

Regarding the overlap, don't type checkers usually match the first valid overload?
...
All of mypy, pyright, pyrefly, ty, and zuban, do exactly that. pandas and numpy both rely on this in their stubs, so I think this is unlikely to change?

I read through (python/typing#2250) and it looks like that would be non-conformant behavior:

It's worth noting that this is being proposed by a pyrefly maintainer.
But there looks to have been support from others in (python/typing#566 (comment)) and Spec change: ambiguous arguments in overload call evaluation

Comment thread narwhals/translate.py
Comment on lines +158 to +161
@overload
def from_native( # type: ignore[overload-overlap]
native_object: IntoFrameT, **kwds: Unpack[AllowLazy]
) -> DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]: ...
Copy link
Copy Markdown
Member

@dangotbanned dangotbanned Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know for sure if it'll work, but one way to make the overload defs non-overlapping might be:

@overload
def from_native(native_object: IntoFrameT) -> DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]: ...

Since AllowLazy is using total=False, that might not be enough though 🤔

@MarcoGorelli
Copy link
Copy Markdown
Member Author

I read through (python/typing#2250) and it looks like that would be non-conformant behavior:

I'll ask, but my understanding is that it wouldn't be non-conformant because if you have

@overload
def foo(a: str) -> int: ...  # type: ignore[overload-overlap]

@overload
def foo(a: Sequence[str]) -> str: ...

def foo(a: str | Sequence[str]) -> int | str:
    ...

foo('foo')

then all materialisations of the a argument match the first overload, and so the remaining overloads can be eliminated

@MarcoGorelli
Copy link
Copy Markdown
Member Author

To tie this back specifically to the spec, in step 5, it says

For all arguments, determine whether all possible materializations of the argument’s type are assignable to the corresponding parameter type for each of the remaining overloads. If so, eliminate all of the subsequent remaining overloads.

In this case, a='foo' is assignable to a: str and so the remaining overloads can be ignored

My reading of the linked issues (python/typing#566 (comment)) and Spec change: ambiguous arguments in overload call evaluation is that they're about different issues

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Apr 13, 2026

My reading of the linked issues (python/typing#566 (comment)) and Spec change: ambiguous arguments in overload call evaluation is that they're about different issues

I think this is the same issue

Which pyright currently infers to the first overload, and this is what I'm trying to say is something we shouldn't rely on because it is unspecified and soon to be specified as non-conformant behavior for a type checker to do

> # Step 5 eliminates the first overload because there exists a
> # materialization of `A[Any]` that is not assignable to `A[None]`. Step 6
> # picks the second overload.
Show pre-3.12 version

from typing import TypeVar, Generic, Any, assert_type, overload

T = TypeVar("T")


class A(Generic[T]):
    x: T

    def f(self) -> T:
        return self.x


@overload
def example12(x: A[None]) -> A[None]: ...


@overload
def example12(x: A[Any]) -> A[Any]: ...


def example12(x: A[Any]) -> A[Any]:
    return x


def check_example12(x: Any):
    # Step 5 eliminates the first overload because there exists a
    # materialization of `A[Any]` that is not assignable to `A[None]`. Step 6
    # picks the second overload.
    ret = example12(x)
    assert_type(ret, A[Any])  # "assert_type" mismatch: expected "A[Any]" but received "A[None] "Pylance(reportAssertTypeFailure)"

@MarcoGorelli
Copy link
Copy Markdown
Member Author

That's dealing with a case where an argument of type Any is passed, which has a materialisation which doesn't match the first overload

In the example I gave, we're passing a string argument, in which every materialisation matches the first overload

@overload
def foo(a: str)

If you believe that the example I gave (#3496 (comment)) is non-conformant, may I please ask that you open a discussion on the typing discourse or on their github?

@dangotbanned
Copy link
Copy Markdown
Member

If you believe that the example I gave #3496 (comment) is non-conformant

This PR is not about that example, although both are related to overlapping overloads.
I'm interested in specifically (#3496 (comment)), which is where the change to IntoFrameT has caused all the new issues.

By my count, IntoDataFrameT and IntoLazyFrameT (with Unpack[AllowLazy]) each can match 2 of these 3 overloads:

I believe this is why you see # type: ignore[overload-overlap] following the second two.
It's possible (#3496 (comment)) may be a path to avoiding the overlap.

At worst, it couldn't hurt to try?


I'm not sure what issue is being surpressed on this line:

But I do know that it would also prevent other type checkers from showing a diagnostic there if they had one


So bringing it back to (#3496 (comment))
I appreciate that you've addressed most of my points and I've marked lots of things as resolved 🥳

But I currently feel:

  1. That the warnings from type checkers are correct.
    If we can find a solution that focuses on avoiding them, it would be best for us in the long-term
  2. If this is working right now (which I feel is reliant on shaky grounds), it's reasonable to think it may soon not be

@MarcoGorelli
Copy link
Copy Markdown
Member Author

This PR is not about that example

Sure, but just to check that we're on the same page - do you agree that that example is conformant or not?

By my count, IntoDataFrameT and IntoLazyFrameT (with Unpack[AllowLazy]) each can match 2 of these 3 overloads:

Yes, and all materialisations of IntoDataFrameT will match the second one (L154-L157), so the third one will be ignored (at least, that's my understanding)

It's possible (#3496 (comment)) may be a path to avoiding the overlap.
At worst, it couldn't hurt to try?

Yup, thanks 🙏 . Have tried

--git a/narwhals/translate.py b/narwhals/translate.py
index f9ef32461..5ed0bde06 100644
--- a/narwhals/translate.py
+++ b/narwhals/translate.py
@@ -148,17 +148,19 @@ def from_native(
     native_object: IntoLazyFrameT, **kwds: Unpack[AllowLazy]
 ) -> LazyFrame[IntoLazyFrameT]: ...
 @overload
-def from_native(  # type: ignore[overload-overlap]
+def from_native(
     native_object: IntoDataFrameT | IntoSeriesT, **kwds: Unpack[AllowSeries]
 ) -> DataFrame[IntoDataFrameT] | Series[IntoSeriesT]: ...
 @overload
 def from_native(
     native_object: IntoDataFrameT | IntoLazyFrameT, **kwds: Unpack[AllowLazy]
 ) -> DataFrame[IntoDataFrameT] | LazyFrame[IntoLazyFrameT]: ...
+# @overload
+# def from_native(  # type: ignore[overload-overlap]
+#     native_object: IntoFrameT, **kwds: Unpack[AllowLazy]
+# ) -> DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]: ...
 @overload
-def from_native(  # type: ignore[overload-overlap]
-    native_object: IntoFrameT, **kwds: Unpack[AllowLazy]
-) -> DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]: ...
+def from_native(native_object: IntoFrameT) -> DataFrame[IntoFrameT] | LazyFrame[IntoFrameT]: ...
 @overload
 def from_native(
     native_object: IntoDataFrameT | IntoLazyFrameT | IntoSeriesT, **kwds: Unpack[AllowAny]
@@ -175,7 +177,7 @@ def from_native(
     series_only: bool,
     allow_series: bool | None,
 ) -> Any: ...
-def from_native(  # type: ignore[misc]
+def from_native(
     native_object: IntoFrameT
     | IntoLazyFrameT
     | IntoDataFrameT

, and mypy reports

narwhals/translate.py:151: error: Overloaded function signatures 10 and 11 overlap with incompatible return types  [overload-overlap]
    def from_native(
    ^
narwhals/translate.py:180: error: Overloaded function implementation cannot produce return type of signature 12  [misc]
    def from_native(
    ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants