Skip to content

Commit 0a67642

Browse files
docs: use IntoDataFrameT/IntoFrameT in docs (#1664)
* docs: use intoDataFrame in docs * fixup --------- Co-authored-by: Marco Gorelli <[email protected]>
1 parent f656fb3 commit 0a67642

File tree

4 files changed

+24
-12
lines changed

4 files changed

+24
-12
lines changed

docs/pandas_like_concepts/boolean.md

+5-2
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ For example, if you do `nw.col('a')*2`, then:
88

99
```python exec="1" source="above" session="boolean"
1010
import narwhals as nw
11-
from narwhals.typing import FrameT
11+
from narwhals.typing import IntoFrameT
1212

1313
data = {"a": [1.4, None, 4.2]}
1414

1515

16-
def multiplication(df: FrameT) -> FrameT:
16+
def multiplication(df: IntoFrameT) -> IntoFrameT:
1717
return nw.from_native(df).with_columns((nw.col("a") * 2).alias("a*2")).to_native()
1818
```
1919

@@ -57,6 +57,9 @@ be a temporary legacy pandas issue which will eventually go
5757
away anyway.
5858

5959
```python exec="1" source="above" session="boolean"
60+
from narwhals.typing import FrameT
61+
62+
6063
def comparison(df: FrameT) -> FrameT:
6164
return nw.from_native(df).with_columns((nw.col("a") > 2).alias("a>2")).to_native()
6265
```

docs/pandas_like_concepts/column_names.md

+5-6
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,11 @@
22

33
Polars and PyArrow only allow for string column names. What about pandas?
44

5-
```python
6-
>>> import pandas as pd
7-
>>> pd.concat([pd.Series([1, 2], name=0), pd.Series([1, 3], name=0)], axis=1)
8-
0 0
9-
0 1 1
10-
1 2 3
5+
```python exec="true" source="above" result="python" session="col_names"
6+
import pandas as pd
7+
8+
df = pd.concat([pd.Series([1, 2], name=0), pd.Series([1, 3], name=0)], axis=1)
9+
print(df)
1110
```
1211

1312
Oh...not only does it let us create a dataframe with a column named `0` - it lets us

docs/pandas_like_concepts/pandas_index.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,10 @@ Let's learn about what Narwhals promises.
2020

2121
```python exec="1" source="above" session="ex1"
2222
import narwhals as nw
23+
from narwhals.typing import IntoFrameT
2324

2425

25-
def my_func(df):
26+
def my_func(df: IntoFrameT) -> IntoFrameT:
2627
df = nw.from_native(df)
2728
df = df.with_columns(a_plus_one=nw.col("a") + 1)
2829
return nw.to_native(df)
@@ -51,13 +52,16 @@ df_pd = pd.DataFrame({"a": [2, 1, 3], "b": [4, 5, 6]})
5152
s_pd = df_pd["a"].sort_values()
5253
df_pd["a_sorted"] = s_pd
5354
```
55+
5456
Reading the code, you might expect that `'a_sorted'` will contain the
5557
values `[1, 2, 3]`.
5658

5759
**However**, here's what actually happens:
60+
5861
```python exec="1" source="material-block" session="ex2" result="python"
5962
print(df_pd)
6063
```
64+
6165
In other words, pandas' index alignment undid the `sort_values` operation!
6266

6367
Narwhals, on the other hand, preserves the index of the left-hand-side argument.

docs/pandas_like_concepts/user_warning.md

+9-3
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,15 @@ The pandas API most likely cannot efficiently handle the complexity of the aggre
1414
```python exec="true" source="above" result="python" session="df_ex1"
1515
import narwhals as nw
1616
import pandas as pd
17+
from narwhals.typing import IntoFrameT
1718

1819
data = {"a": [1, 2, 3, 4, 5], "b": [5, 4, 3, 2, 1], "c": [10, 20, 30, 40, 50]}
1920

2021
df_pd = pd.DataFrame(data)
2122

2223

2324
@nw.narwhalify
24-
def approach_1(df):
25+
def approach_1(df: IntoFrameT) -> IntoFrameT:
2526

2627
# Pay attention to this next line
2728
df = df.group_by("a").agg(d=(nw.col("b") + nw.col("c")).sum())
@@ -43,7 +44,7 @@ The pandas API most likely cannot efficiently handle the complexity of the aggre
4344

4445

4546
@nw.narwhalify
46-
def approach_2(df):
47+
def approach_2(df: IntoFrameT) -> IntoFrameT:
4748

4849
# Pay attention to this next line
4950
df = df.with_columns(d=nw.col("b") + nw.col("c")).group_by("a").agg(nw.sum("d"))
@@ -54,40 +55,45 @@ The pandas API most likely cannot efficiently handle the complexity of the aggre
5455
print(approach_2(df_pd))
5556
```
5657

57-
5858
Both Approaches shown above return the exact same result, but Approach 1 is inefficient and returns the warning message
5959
we showed at the top.
6060

6161
What makes the first approach inefficient and the second approach efficient? It comes down to what the
6262
pandas API lets us express.
6363

6464
## Approach 1
65+
6566
```python
6667
# From line 11
6768

6869
return df.group_by("a").agg((nw.col("b") + nw.col("c")).sum().alias("d"))
6970
```
7071

7172
To translate this to pandas, we would do:
73+
7274
```python
7375
df.groupby("a").apply(
7476
lambda df: pd.Series([(df["b"] + df["c"]).sum()], index=["d"]), include_groups=False
7577
)
7678
```
79+
7780
Any time you use `apply` in pandas, that's a performance footgun - best to avoid it and use vectorised operations instead.
7881
Let's take a look at how "approach 2" gets translated to pandas to see the difference.
7982

8083
## Approach 2
84+
8185
```python
8286
# Line 11 in Approach 2
8387

8488
return df.with_columns(d=nw.col("b") + nw.col("c")).group_by("a").agg({"d": "sum"})
8589
```
8690

8791
This gets roughly translated to:
92+
8893
```python
8994
df.assign(d=lambda df: df["b"] + df["c"]).groupby("a").agg({"d": "sum"})
9095
```
96+
9197
Because we're using pandas' own API, as opposed to `apply` and a custom `lambda` function, then this is going to be much more efficient.
9298

9399
## Tips for Avoiding the `UserWarning`

0 commit comments

Comments
 (0)