You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This module provides a bridge between `Scikit-Learn <http://scikit-learn.org/stable>`__'s machine learning methods and `pandas <https://pandas.pydata.org>`__-style Data Frames.
13
15
In particular, it provides a way to map ``DataFrame`` columns to transformations, which are later recombined into features.
14
16
@@ -89,7 +91,7 @@ Let's see an example::
89
91
90
92
The difference between specifying the column selector as ``'column'`` (as a simple string) and ``['column']`` (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
91
93
92
-
This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` indexing:
94
+
This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` indexing::
93
95
94
96
>>> data['children'].shape
95
97
(8,)
@@ -164,8 +166,9 @@ Alternatively, you can also specify prefix and/or suffix to add to the column na
164
166
165
167
Dynamic Columns
166
168
***********************
167
-
In some situations the columns are not known before hand and we would like to dynamically select them during the fit operation. As shown below, in such situations you can provide either a custom callable or use `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`__.
169
+
In some situations the columns are not known before hand and we would like to dynamically select them during the fit operation. As shown below, in such situations you can provide either a custom callable or use `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`__.
168
170
171
+
::
169
172
170
173
>>> class GetColumnsStartingWith:
171
174
... def __init__(self, start_str):
@@ -273,14 +276,14 @@ Dropping columns explictly
273
276
274
277
Sometimes it is required to drop a specific column/ list of columns.
275
278
For this purpose, ``drop_cols`` argument for ``DataFrameMapper`` can be used.
@@ -385,7 +388,7 @@ acceptable by ``DataFrameMapper``.
385
388
386
389
For example, consider a dataset with three categorical columns, 'col1', 'col2', and 'col3',
387
390
To binarize each of them, one could pass column names and ``LabelBinarizer`` transformer class
388
-
into generator, and then use returned definition as ``features`` argument for ``DataFrameMapper``:
391
+
into generator, and then use returned definition as ``features`` argument for ``DataFrameMapper``::
389
392
390
393
>>> from sklearn_pandas import gen_features
391
394
>>> feature_def = gen_features(
@@ -407,7 +410,7 @@ into generator, and then use returned definition as ``features`` argument for ``
407
410
408
411
If it is required to override some of transformer parameters, then a dict with 'class' key and
409
412
transformer parameters should be provided. For example, consider a dataset with missing values.
410
-
Then the following code could be used to override default imputing strategy:
413
+
Then the following code could be used to override default imputing strategy::
411
414
412
415
>>> from sklearn.impute import SimpleImputer
413
416
>>> import numpy as np
@@ -451,6 +454,8 @@ Feature selection and other supervised transformations
451
454
452
455
``DataFrameMapper`` supports transformers that require both X and y arguments. An example of this is feature selection. Treating the 'pet' column as the target, we will select the column that best predicts it.
453
456
457
+
::
458
+
454
459
>>> from sklearn.feature_selection import SelectKBest, chi2
0 commit comments