|  | 
|  | 1 | +# Complete example | 
|  | 2 | + | 
|  | 3 | +We're going to write a dataframe-agnostic "Standard Scaler". This class will have | 
|  | 4 | +`fit` and `transform` methods (like `scikit-learn` transformers), and will work | 
|  | 5 | +agnosstically for pandas and Polars. | 
|  | 6 | + | 
|  | 7 | +We'll need to write two methods: | 
|  | 8 | + | 
|  | 9 | +- `fit`: find the mean and standard deviation for each column from a given training set; | 
|  | 10 | +- `transform`: scale a given dataset with the mean and standard deviations calculated | 
|  | 11 | +  during `fit`. | 
|  | 12 | + | 
|  | 13 | +The `fit` method is a bit complicated, so let's start with `transform`. | 
|  | 14 | +Suppose we've already calculated the mean and standard deviation of each column, and have | 
|  | 15 | +stored them in attributes `self.means` and `self.std_devs`. | 
|  | 16 | + | 
|  | 17 | +## Transform method | 
|  | 18 | + | 
|  | 19 | +The general strategy will be: | 
|  | 20 | + | 
|  | 21 | +1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`. | 
|  | 22 | +2. Express your logic using the subset of the Polars API supported by Narwhals. | 
|  | 23 | +3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`. | 
|  | 24 | + | 
|  | 25 | +```python | 
|  | 26 | +import narwhals as nw | 
|  | 27 | + | 
|  | 28 | +class StandardScalar: | 
|  | 29 | +    def transform(self, df): | 
|  | 30 | +        df = nw.DataFrame(df) | 
|  | 31 | +        df = df.with_columns( | 
|  | 32 | +            (nw.col(col) - self._means[col]) / self._std_devs[col] | 
|  | 33 | +            for col in df.columns | 
|  | 34 | +        ) | 
|  | 35 | +        return nw.to_native(df) | 
|  | 36 | +``` | 
|  | 37 | + | 
|  | 38 | +Note that all the calculations here can stay lazy if the underlying library permits it. | 
|  | 39 | +For Polars, the return value is a `polars.LazyFrame` - it is the caller's responsibility to | 
|  | 40 | +call `.collect()` on the result if they want to materialise its values. | 
|  | 41 | + | 
|  | 42 | +## Fit method | 
|  | 43 | + | 
|  | 44 | +Unlike the `transform` method, `fit` cannot stay lazy, as we need to compute concrete values | 
|  | 45 | +for the means and standard deviations. | 
|  | 46 | + | 
|  | 47 | +To be able to get `Series` out of our `DataFrame`, we'll need the `DataFrame` to be an | 
|  | 48 | +eager one, as Polars doesn't have a concept of lazy `Series`. | 
|  | 49 | +To do that, when we instantiate our `narwhals.DataFrame`, we pass `features=['eager']`, | 
|  | 50 | +which lets us access eager-only features. | 
|  | 51 | + | 
|  | 52 | +```python | 
|  | 53 | +import narwhals as nw | 
|  | 54 | + | 
|  | 55 | +class StandardScalar: | 
|  | 56 | +    def fit(self, df): | 
|  | 57 | +        df = nw.DataFrame(df, features=['eager']) | 
|  | 58 | +        self._means = {df[col].mean() for col in df.columns} | 
|  | 59 | +        self._std_devs = {df[col].std() for col in df.columns} | 
|  | 60 | +``` | 
|  | 61 | + | 
|  | 62 | +## Putting it all together | 
|  | 63 | + | 
|  | 64 | +Here is our dataframe-agnostic standard scaler: | 
|  | 65 | +```python exec="1" source="above" session="tute-ex1" | 
|  | 66 | +import narwhals as nw | 
|  | 67 | + | 
|  | 68 | +class StandardScaler: | 
|  | 69 | +    def fit(self, df): | 
|  | 70 | +        df = nw.DataFrame(df, features=["eager"]) | 
|  | 71 | +        self._means = {col: df[col].mean() for col in df.columns} | 
|  | 72 | +        self._std_devs = {col: df[col].std() for col in df.columns} | 
|  | 73 | + | 
|  | 74 | +    def transform(self, df): | 
|  | 75 | +        df = nw.DataFrame(df) | 
|  | 76 | +        df = df.with_columns( | 
|  | 77 | +            (nw.col(col) - self._means[col]) / self._std_devs[col] | 
|  | 78 | +            for col in df.columns | 
|  | 79 | +        ) | 
|  | 80 | +        return nw.to_native(df) | 
|  | 81 | +``` | 
|  | 82 | + | 
|  | 83 | +Next, let's try running it. Notice how, as `transform` doesn't use | 
|  | 84 | +`features=['lazy']`, we can pass a `polars.LazyFrame` to it without issues! | 
|  | 85 | + | 
|  | 86 | +=== "pandas" | 
|  | 87 | +    ```python exec="true" source="material-block" result="python" session="tute-ex1" | 
|  | 88 | +    import pandas as pd | 
|  | 89 | + | 
|  | 90 | +    df_train = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | 
|  | 91 | +    df_test = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | 
|  | 92 | +    scaler = StandardScaler() | 
|  | 93 | +    scaler.fit(df_train) | 
|  | 94 | +    print(scaler.transform(df_test)) | 
|  | 95 | +    ``` | 
|  | 96 | + | 
|  | 97 | +=== "Polars" | 
|  | 98 | +    ```python exec="true" source="material-block" result="python" session="tute-ex1" | 
|  | 99 | +    import polars as pl | 
|  | 100 | + | 
|  | 101 | +    df_train = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | 
|  | 102 | +    df_test = pl.LazyFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | 
|  | 103 | +    scaler = StandardScaler() | 
|  | 104 | +    scaler.fit(df_train) | 
|  | 105 | +    print(scaler.transform(df_test).collect()) | 
|  | 106 | +    ``` | 
0 commit comments