Skip to content

Commit d2539ba

Browse files
committed
Add documentation about using multiple transformers for the same column, as well as some caveats about passing a list or a simple string as columns selector.
1 parent 0faead7 commit d2539ba

File tree

1 file changed

+27
-2
lines changed

1 file changed

+27
-2
lines changed

README.rst

+27-2
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,23 @@ Transformation Mapping
6060
Map the Columns to Transformations
6161
**********************************
6262

63-
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column::
63+
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame, or a list containing one or multiple columns (we will see an example with multiple columns later). The second is an object which will perform the transformation which will be applied to that column::
6464

6565
>>> mapper = DataFrameMapper([
6666
... ('pet', sklearn.preprocessing.LabelBinarizer()),
67-
... ('children', sklearn.preprocessing.StandardScaler())
67+
... (['children'], sklearn.preprocessing.StandardScaler())
6868
... ])
6969

70+
The difference between specifying the column selector as `'column'` (as a simple stirng) and `['column']` (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array with be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
71+
72+
This behaviour mimics the same pattern as pandas' dataframes `__getitem__` indexing:
73+
74+
>>> data['children'].shape
75+
(8,)
76+
>>> data[['children']].shape
77+
(8, 1)
78+
79+
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like `OneHotEncoder` or `Imputer`, expect 2-dimensional input, with the shape `[n_samples, n_features]`.
7080

7181
Test the Transformation
7282
***********************
@@ -112,6 +122,21 @@ Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` co
112122
[ -6.4],
113123
[-15.4]])
114124

125+
Multiple transformers for the same column
126+
*****************************************
127+
128+
Multiple transformers can be applied to the same column specifying them
129+
in a list::
130+
131+
>>> mapper3 = DataFrameMapper([
132+
... (['age'], [sklearn.preprocessing.Imputer(),
133+
... sklearn.preprocessing.StandardScaler()])])
134+
>>> data_3 = pd.DataFrame({'age': [1, np.nan, 3]})
135+
>>> mapper3.fit_transform(data_3)
136+
array([[-1.22474487],
137+
[ 0. ],
138+
[ 1.22474487]])
139+
115140
Columns that don't need any transformation
116141
******************************************
117142

0 commit comments

Comments
 (0)