Skip to content

Commit 8ed1805

Browse files
committed
Make copies of the dataframe under test to avoid its mutation during the test.
This is probably a old scikit-learn bug, since it works without this modification in sklearn==0.16.1.
1 parent e6f58b6 commit 8ed1805

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Test the Transformation
7373

7474
We can use the ``fit_transform`` shortcut to both fit the model and see what transformed data looks like. In this and the other examples, output is rounded to two digits with ``np.round`` to account for rounding errors on different hardware::
7575

76-
>>> np.round(mapper.fit_transform(data), 2)
76+
>>> np.round(mapper.fit_transform(data.copy()), 2)
7777
array([[ 1. , 0. , 0. , 0.21],
7878
[ 0. , 1. , 0. , 1.88],
7979
[ 0. , 1. , 0. , -0.63],
@@ -102,7 +102,7 @@ Transformations may require multiple input columns. In these cases, the column n
102102
103103
Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` columns and return the first principal component::
104104

105-
>>> np.round(mapper2.fit_transform(data), 1)
105+
>>> np.round(mapper2.fit_transform(data.copy()), 1)
106106
array([[ 47.6],
107107
[-18.4],
108108
[ 1.6],
@@ -121,7 +121,7 @@ Only columns that are listed in the DataFrameMapper are kept. To keep a column b
121121
... ('pet', sklearn.preprocessing.LabelBinarizer()),
122122
... ('children', None)
123123
... ])
124-
>>> np.round(mapper3.fit_transform(data))
124+
>>> np.round(mapper3.fit_transform(data.copy()))
125125
array([[ 1., 0., 0., 4.],
126126
[ 0., 1., 0., 6.],
127127
[ 0., 1., 0., 3.],
@@ -141,7 +141,7 @@ To get around this, sklearn-pandas provides a wrapper on sklearn's ``cross_val_s
141141
>>> pipe = sklearn.pipeline.Pipeline([
142142
... ('featurize', mapper),
143143
... ('lm', sklearn.linear_model.LinearRegression())])
144-
>>> np.round(cross_val_score(pipe, data, data.salary, 'r2'), 2)
144+
>>> np.round(cross_val_score(pipe, data.copy(), data.salary, 'r2'), 2)
145145
array([ -1.09, -5.3 , -15.38])
146146

147147
Sklearn-pandas' ``cross_val_score`` function provides exactly the same interface as sklearn's function of the same name.

0 commit comments

Comments
 (0)