Skip to content

Commit c374c98

Browse files
committed
Merge pull request #25 from dukebody/keep-columns-no-transformation-issue-19
Add documentation example selecting column but not applying any transformer to it
2 parents aa070c7 + 8ed1805 commit c374c98

File tree

1 file changed

+22
-3
lines changed

1 file changed

+22
-3
lines changed

README.rst

+22-3
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Test the Transformation
7373

7474
We can use the ``fit_transform`` shortcut to both fit the model and see what transformed data looks like. In this and the other examples, output is rounded to two digits with ``np.round`` to account for rounding errors on different hardware::
7575

76-
>>> np.round(mapper.fit_transform(data), 2)
76+
>>> np.round(mapper.fit_transform(data.copy()), 2)
7777
array([[ 1. , 0. , 0. , 0.21],
7878
[ 0. , 1. , 0. , 1.88],
7979
[ 0. , 1. , 0. , -0.63],
@@ -102,7 +102,7 @@ Transformations may require multiple input columns. In these cases, the column n
102102
103103
Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` columns and return the first principal component::
104104

105-
>>> np.round(mapper2.fit_transform(data), 1)
105+
>>> np.round(mapper2.fit_transform(data.copy()), 1)
106106
array([[ 47.6],
107107
[-18.4],
108108
[ 1.6],
@@ -112,6 +112,25 @@ Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` co
112112
[ -6.4],
113113
[-15.4]])
114114

115+
Columns that don't need any transformation
116+
******************************************
117+
118+
Only columns that are listed in the DataFrameMapper are kept. To keep a column but don't apply any transformation to it, use `None` as transformer::
119+
120+
>>> mapper3 = DataFrameMapper([
121+
... ('pet', sklearn.preprocessing.LabelBinarizer()),
122+
... ('children', None)
123+
... ])
124+
>>> np.round(mapper3.fit_transform(data.copy()))
125+
array([[ 1., 0., 0., 4.],
126+
[ 0., 1., 0., 6.],
127+
[ 0., 1., 0., 3.],
128+
[ 0., 0., 1., 3.],
129+
[ 1., 0., 0., 2.],
130+
[ 0., 1., 0., 3.],
131+
[ 1., 0., 0., 5.],
132+
[ 0., 0., 1., 4.]])
133+
115134
Cross-Validation
116135
----------------
117136

@@ -122,7 +141,7 @@ To get around this, sklearn-pandas provides a wrapper on sklearn's ``cross_val_s
122141
>>> pipe = sklearn.pipeline.Pipeline([
123142
... ('featurize', mapper),
124143
... ('lm', sklearn.linear_model.LinearRegression())])
125-
>>> np.round(cross_val_score(pipe, data, data.salary, 'r2'), 2)
144+
>>> np.round(cross_val_score(pipe, data.copy(), data.salary, 'r2'), 2)
126145
array([ -1.09, -5.3 , -15.38])
127146

128147
Sklearn-pandas' ``cross_val_score`` function provides exactly the same interface as sklearn's function of the same name.

0 commit comments

Comments
 (0)