Skip to content

Commit 9ce6f21

Browse files
committed
add docs page
1 parent 0170195 commit 9ce6f21

11 files changed

+396
-0
lines changed

.github/workflows/mkdocs.yml

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: mkdocs
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
permissions:
8+
contents: write
9+
jobs:
10+
deploy:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
- name: Configure Git Credentials
15+
run: |
16+
git config user.name github-actions[bot]
17+
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
18+
- uses: actions/setup-python@v4
19+
with:
20+
python-version: 3.x
21+
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
22+
23+
24+
- uses: actions/cache@v3
25+
with:
26+
key: mkdocs-material-${{ env.cache_id }}
27+
path: .cache
28+
restore-keys: |
29+
mkdocs-material-
30+
- run: pip install -r docs/requirements-docs.txt -e . pandas polars
31+
32+
- run: mkdocs gh-deploy --force

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
*.pyc
33
todo.md
44
.coverage
5+
site/

docs/basics/column.md

+94
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Column
2+
3+
In [dataframe.md](dataframe.md), you learned how to write a dataframe-agnostic function.
4+
5+
We only used DataFrame methods there - but what if we need to operate on its columns?
6+
7+
## Extracting a column
8+
9+
10+
## Example 1: filter based on a column's values
11+
12+
```python exec="1" source="above" session="ex1"
13+
import narwhals as nw
14+
15+
def my_func(df):
16+
df_s = nw.DataFrame(df)
17+
df_s = df_s.filter(nw.col('a') > 0)
18+
return nw.to_native(df_s)
19+
```
20+
21+
=== "pandas"
22+
```python exec="true" source="material-block" result="python" session="ex1"
23+
import pandas as pd
24+
25+
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
26+
print(my_func(df))
27+
```
28+
29+
=== "Polars"
30+
```python exec="true" source="material-block" result="python" session="ex1"
31+
import polars as pl
32+
33+
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
34+
print(my_func(df))
35+
```
36+
37+
38+
## Example 2: multiply a column's values by a constant
39+
40+
Let's write a dataframe-agnostic function which multiplies the values in column
41+
`'a'` by 2.
42+
43+
```python exec="1" source="above" session="ex2"
44+
import narwhals as nw
45+
46+
def my_func(df):
47+
df_s = nw.DataFrame(df)
48+
df_s = df_s.with_columns(nw.col('a')*2)
49+
return nw.to_native(df_s)
50+
```
51+
52+
=== "pandas"
53+
```python exec="true" source="material-block" result="python" session="ex2"
54+
import pandas as pd
55+
56+
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
57+
print(my_func(df))
58+
```
59+
60+
=== "Polars"
61+
```python exec="true" source="material-block" result="python" session="ex2"
62+
import polars as pl
63+
64+
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
65+
print(my_func(df))
66+
```
67+
68+
Note that column `'a'` was overwritten. If we had wanted to add a new column called `'c'` containing column `'a'`'s
69+
values multiplied by 2, we could have used `Column.rename`:
70+
71+
```python exec="1" source="above" session="ex2.1"
72+
import narwhals as nw
73+
74+
def my_func(df):
75+
df_s = nw.DataFrame(df)
76+
df_s = df_s.with_columns((nw.col('a')*2).alias('c'))
77+
return nw.to_native(df_s)
78+
```
79+
80+
=== "pandas"
81+
```python exec="true" source="material-block" result="python" session="ex2.1"
82+
import pandas as pd
83+
84+
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
85+
print(my_func(df))
86+
```
87+
88+
=== "Polars"
89+
```python exec="true" source="material-block" result="python" session="ex2.1"
90+
import polars as pl
91+
92+
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
93+
print(my_func(df))
94+
```

docs/basics/complete_example.md

+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Complete example
2+
3+
We're going to write a dataframe-agnostic "Standard Scaler". This class will have
4+
`fit` and `transform` methods (like `scikit-learn` transformers), and will work
5+
agnosstically for pandas and Polars.
6+
7+
We'll need to write two methods:
8+
9+
- `fit`: find the mean and standard deviation for each column from a given training set;
10+
- `transform`: scale a given dataset with the mean and standard deviations calculated
11+
during `fit`.
12+
13+
The `fit` method is a bit complicated, so let's start with `transform`.
14+
Suppose we've already calculated the mean and standard deviation of each column, and have
15+
stored them in attributes `self.means` and `self.std_devs`.
16+
17+
## Transform method
18+
19+
The general strategy will be:
20+
21+
1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
22+
2. Express your logic using the subset of the Polars API supported by Narwhals.
23+
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.
24+
25+
```python
26+
import narwhals as nw
27+
28+
class StandardScalar:
29+
def transform(self, df):
30+
df = nw.DataFrame(df)
31+
df = df.with_columns(
32+
(nw.col(col) - self._means[col]) / self._std_devs[col]
33+
for col in df.columns
34+
)
35+
return nw.to_native(df)
36+
```
37+
38+
Note that all the calculations here can stay lazy if the underlying library permits it.
39+
For Polars, the return value is a `polars.LazyFrame` - it is the caller's responsibility to
40+
call `.collect()` on the result if they want to materialise its values.
41+
42+
## Fit method
43+
44+
Unlike the `transform` method, `fit` cannot stay lazy, as we need to compute concrete values
45+
for the means and standard deviations.
46+
47+
To be able to get `Series` out of our `DataFrame`, we'll need the `DataFrame` to be an
48+
eager one, as Polars doesn't have a concept of lazy `Series`.
49+
To do that, when we instantiate our `narwhals.DataFrame`, we pass `features=['eager']`,
50+
which lets us access eager-only features.
51+
52+
```python
53+
import narwhals as nw
54+
55+
class StandardScalar:
56+
def fit(self, df):
57+
df = nw.DataFrame(df, features=['eager'])
58+
self._means = {df[col].mean() for col in df.columns}
59+
self._std_devs = {df[col].std() for col in df.columns}
60+
```
61+
62+
## Putting it all together
63+
64+
Here is our dataframe-agnostic standard scaler:
65+
```python exec="1" source="above" session="tute-ex1"
66+
import narwhals as nw
67+
68+
class StandardScaler:
69+
def fit(self, df):
70+
df = nw.DataFrame(df, features=["eager"])
71+
self._means = {col: df[col].mean() for col in df.columns}
72+
self._std_devs = {col: df[col].std() for col in df.columns}
73+
74+
def transform(self, df):
75+
df = nw.DataFrame(df)
76+
df = df.with_columns(
77+
(nw.col(col) - self._means[col]) / self._std_devs[col]
78+
for col in df.columns
79+
)
80+
return nw.to_native(df)
81+
```
82+
83+
Next, let's try running it. Notice how, as `transform` doesn't use
84+
`features=['lazy']`, we can pass a `polars.LazyFrame` to it without issues!
85+
86+
=== "pandas"
87+
```python exec="true" source="material-block" result="python" session="tute-ex1"
88+
import pandas as pd
89+
90+
df_train = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
91+
df_test = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
92+
scaler = StandardScaler()
93+
scaler.fit(df_train)
94+
print(scaler.transform(df_test))
95+
```
96+
97+
=== "Polars"
98+
```python exec="true" source="material-block" result="python" session="tute-ex1"
99+
import polars as pl
100+
101+
df_train = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
102+
df_test = pl.LazyFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
103+
scaler = StandardScaler()
104+
scaler.fit(df_train)
105+
print(scaler.transform(df_test).collect())
106+
```

docs/basics/dataframe.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# DataFrame
2+
3+
To write a dataframe-agnostic function, the steps you'll want to follow are:
4+
5+
1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
6+
2. Express your logic using the subset of the Polars API supported by Narwhals.
7+
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.
8+
9+
Let's try writing a simple example.
10+
11+
## Example 1: group-by and mean
12+
13+
Make a Python file `t.py` with the following content:
14+
```python exec="1" source="above" session="df_ex1"
15+
import narwhals as nw
16+
17+
def func(df):
18+
# 1. Create a Narwhals dataframe
19+
df_s = nw.DataFrame(df)
20+
# 2. Use the subset of the Polars API supported by Narwhals
21+
df_s = df_s.group_by('a').agg(nw.col('b').mean())
22+
# 3. Return a library from the user's original library
23+
return nw.to_native(df_s)
24+
```
25+
Let's try it out:
26+
27+
=== "pandas"
28+
```python exec="true" source="material-block" result="python" session="df_ex1"
29+
import pandas as pd
30+
31+
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
32+
print(func(df))
33+
```
34+
35+
=== "Polars"
36+
```python exec="true" source="material-block" result="python" session="df_ex1"
37+
import polars as pl
38+
39+
df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
40+
print(func(df))
41+
```

docs/index.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Narwhals
2+
3+
Extremely lightweight compatibility layer between pandas and Polars:
4+
5+
- ✅ No dependencies.
6+
- ✅ Lightweight: wheel is smaller than 30 kB.
7+
- ✅ Simple, minimal, and predictable.
8+
9+
No need to choose - support both with ease!
10+
11+
## Who's this for?
12+
13+
Anyone wishing to write a library/application/service which consumes dataframes, and wishing to make it
14+
completely dataframe-agnostic.
15+
16+
## Let's get started!
17+
18+
- [Installation](installation.md)
19+
- [Quick start](quick_start.md)

docs/installation.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Installation
2+
3+
First, make sure you have [created and activated](https://docs.python.org/3/library/venv.html) a Python3.8+ virtual environment.
4+
5+
Then, run
6+
```console
7+
python -m pip install narwhals
8+
```
9+
10+
Then, if you start the Python REPL and see the following:
11+
```python
12+
>>> import narwhals
13+
>>> narwhals
14+
'0.4.1'
15+
```
16+
then installation worked correctly!

docs/quick_start.md

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Quick start
2+
3+
## Prerequisites
4+
5+
Please start by following the [installation instructions](installation.md)
6+
7+
Then, please install the following:
8+
9+
- [pandas](https://pandas.pydata.org/docs/getting_started/install.html)
10+
- [Polars](https://pola-rs.github.io/polars/user-guide/installation/)
11+
12+
## Simple example
13+
14+
Create a Python file `t.py` with the following content:
15+
16+
```python
17+
import pandas as pd
18+
import polars as pl
19+
import narwhals as nw
20+
21+
22+
def my_function(df_any):
23+
df = nw.DataFrame(df_any)
24+
column_names = df.column_names
25+
return column_names
26+
27+
28+
df_pandas = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
29+
df_polars = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
30+
31+
print('pandas result: ', my_function(df_pandas))
32+
print('Polars result: ', my_function(df_polars))
33+
```
34+
35+
If you run `python t.py` and your output looks like this:
36+
```
37+
pandas result: ['a', 'b']
38+
Polars result: ['a', 'b']
39+
```
40+
41+
then all your installations worked perfectly.
42+
43+
Let's learn about what you just did, and what Narwhals can do for you.

docs/reference.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Reference
2+
3+
Here are some related projects.
4+
5+
## Dataframe Interchange Protocol
6+
7+
Standardised way of interchanging data between libraries, see
8+
[here](https://data-apis.org/dataframe-protocol/latest/index.html).
9+
10+
## Array API
11+
12+
Array counterpart to the DataFrame API, see [here](https://data-apis.org/array-api/2022.12/index.html).

docs/requirements-docs.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
markdown-exec[ansi]
2+
mkdocs
3+
mkdocs-material
4+
mkdocstrings
5+
mkdocstrings[python]

0 commit comments

Comments
 (0)