You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mapper takes a list of tuples. The first element of each tuple is a column name from the pandas DataFrame, or a list containing one or multiple columns (we will see an example with multiple columns later). The second element is an object which will perform the transformation which will be applied to that column. The third one is optional and is a dictionary containing the transformation options, if applicable (see "custom column names for transformed features" below).
78
+
The mapper takes a list of tuples. Each tuple has three elements:
79
+
1. column name(s): The first element is a column name from the pandas DataFrame, or a list containing one or multiple columns (we will see an example with multiple columns later) or an instance of a callable function such as `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`
80
+
2. transformer(s): The second element is an object which will perform the transformation which will be applied to that column.
81
+
3. attributes: The third one is optional and is a dictionary containing the transformation options, if applicable (see "custom column names for transformed features" below).
The difference between specifying the column selector as ``'column'`` (as a simple string) and ``['column']`` (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
90
+
The difference between specifying the column selector as ``'column'`` (as a simple string) and ``['column']`` (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
81
91
82
92
This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` indexing:
83
93
@@ -88,6 +98,7 @@ This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` in
88
98
89
99
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like ``OneHotEncoder`` or ``Imputer``, expect 2-dimensional input, with the shape ``[n_samples, n_features]``.
90
100
101
+
91
102
Test the Transformation
92
103
***********************
93
104
@@ -150,6 +161,46 @@ Alternatively, you can also specify prefix and/or suffix to add to the column na
150
161
>>> mapper_alias.transformed_names_
151
162
['standard_scaled_children', 'children_raw']
152
163
164
+
165
+
Dynamic Columns
166
+
***********************
167
+
In some situations the columns are not known before hand and we would like to dynamically select them during the fit operation. As shown below, in such situations you can provide either a custom callable or use `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`.
168
+
169
+
170
+
>>> classGetColumnsStartingWith:
171
+
... def__init__(self, start_str):
172
+
... self.pattern = start_str
173
+
...
174
+
... def__call__(self, X:pd.DataFrame=None):
175
+
... return [c for c in X.columns if c.startswith(self.pattern)]
Above we use `make_column_selector` to select all columns that are of type float and also use a custom callable function to select columns that start with the word 'petal'.
202
+
203
+
153
204
Passing Series/DataFrames to the transformers
154
205
*********************************************
155
206
@@ -463,6 +514,11 @@ Changelog
463
514
---------
464
515
465
516
517
+
2.2.0 (2021-05-07)
518
+
******************
519
+
* Added an ability to provide callable functions instead of static column list.
520
+
521
+
466
522
2.1.0 (2021-02-26)
467
523
******************
468
524
* Removed test for Python 3.6 and added Python 3.9
0 commit comments