You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.rst
+34-23
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
Sklearn-pandas
3
3
==============
4
4
5
-
This module provides a bridge between [Scikit-Learn](http://scikit-learn.org/stable/)'s machine learning methods and [pandas](http://pandas.pydata.org/)-style Data Frames.
5
+
This module provides a bridge between `Scikit-Learn<http://scikit-learn.org/stable/>`__'s machine learning methods and `pandas<http://pandas.pydata.org/>`__-style Data Frames.
6
6
7
7
In particular, it provides:
8
8
@@ -12,40 +12,42 @@ In particular, it provides:
12
12
Installation
13
13
------------
14
14
15
-
You can install `sklearn-pandas` with `pip`.
15
+
You can install ``sklearn-pandas`` with ``pip``::
16
16
17
17
# pip install sklearn-pandas
18
18
19
19
Tests
20
20
-----
21
21
22
-
The examples in this file double as basic sanity tests. To run them, use `doctest`, which is included with python.
22
+
The examples in this file double as basic sanity tests. To run them, use ``doctest``, which is included with python::
23
23
24
24
# python -m doctest README.md
25
25
26
26
Usage
27
27
-----
28
28
29
-
### Import
29
+
Import
30
+
******
30
31
31
-
Import what you need from the `sklearn_pandas` package. The choices are:
32
+
Import what you need from the ``sklearn_pandas`` package. The choices are:
32
33
33
-
*`DataFrameMapper`, a class for mapping pandas data frame columns to different sklearn transformations
34
-
*`cross_val_score`, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
34
+
* ``DataFrameMapper``, a class for mapping pandas data frame columns to different sklearn transformations
35
+
* ``cross_val_score``, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
35
36
36
-
For this demonstration, we will import both.
37
+
For this demonstration, we will import both::
37
38
38
39
>>> from sklearn_pandas import DataFrameMapper, cross_val_score
39
40
40
-
For these examples, we'll also use pandas and sklearn.
41
+
For these examples, we'll also use pandas and sklearn::
@@ -54,19 +56,21 @@ Normally you'll read the data from a file, but for demonstration purposes I'll c
54
56
Transformation Mapping
55
57
----------------------
56
58
57
-
### Map the Columns to Transformations
59
+
Map the Columns to Transformations
60
+
**********************************
58
61
59
-
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column.
62
+
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column::
We can use the `fit_transform` shortcut to both fit the model and see what transformed data looks like.
73
+
We can use the ``fit_transform`` shortcut to both fit the model and see what transformed data looks like::
70
74
71
75
>>> mapper.fit_transform(data)
72
76
array([[ 1. , 0. , 0. , 0.20851441],
@@ -78,22 +82,23 @@ We can use the `fit_transform` shortcut to both fit the model and see what trans
78
82
[ 1. , 0. , 0. , 1.04257207],
79
83
[ 0. , 0. , 1. , 0.20851441]])
80
84
81
-
Note that the first three columns are the output of the `LabelBinarizer` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the `DataFrameMapper` is constructed.
85
+
Note that the first three columns are the output of the ``LabelBinarizer`` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the ``DataFrameMapper`` is constructed.
82
86
83
-
Now that the transformation is trained, we confirm that it works on new data.
87
+
Now that the transformation is trained, we confirm that it works on new data::
Now running `fit_transform` will run PCA on the `children` and `salary` columns and return the first principal component.
101
+
Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` columns and return the first principal component::
97
102
98
103
>>> mapper2.fit_transform(data)
99
104
array([[ 47.62288153],
@@ -108,14 +113,20 @@ Now running `fit_transform` will run PCA on the `children` and `salary` columns
108
113
Cross-Validation
109
114
----------------
110
115
111
-
Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with `DataFrameMapper`.
116
+
Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with ``DataFrameMapper``.
112
117
113
-
To get around this, sklearn-pandas provides a wrapper on sklearn's `cross_val_score` function which passes a pandas DataFrame to the estimator rather than a numpy array.
118
+
To get around this, sklearn-pandas provides a wrapper on sklearn's ``cross_val_score`` function which passes a pandas DataFrame to the estimator rather than a numpy array::
0 commit comments