Skip to content

Commit d46db24

Browse files
committed
changed readme to RST
1 parent b10d6ee commit d46db24

File tree

5 files changed

+85
-25
lines changed

5 files changed

+85
-25
lines changed

LICENSE

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
sklearn-pandas -- bridge code for cross-validation of pandas data frames
2+
with sklearn
3+
4+
This software is provided 'as-is', without any express or implied
5+
warranty. In no event will the authors be held liable for any damages
6+
arising from the use of this software.
7+
8+
Permission is granted to anyone to use this software for any purpose,
9+
including commercial applications, and to alter it and redistribute it
10+
freely, subject to the following restrictions:
11+
12+
1. The origin of this software must not be misrepresented; you must not
13+
claim that you wrote the original software. If you use this software
14+
in a product, an acknowledgment in the product documentation would be
15+
appreciated but is not required.
16+
2. Altered source versions must be plainly marked as such, and must not be
17+
misrepresented as being the original software.
18+
3. This notice may not be removed or altered from any source distribution.
19+
20+
Paul Butler <[email protected]>
21+
22+
The source code of DataFrameMapper is derived from code originally written by
23+
Ben Hamner and released under the following license.
24+
25+
Copyright (c) 2013, Ben Hamner
26+
Author: Ben Hamner ([email protected])
27+
All rights reserved.
28+
29+
Redistribution and use in source and binary forms, with or without
30+
modification, are permitted provided that the following conditions are met:
31+
32+
1. Redistributions of source code must retain the above copyright notice, this
33+
list of conditions and the following disclaimer.
34+
2. Redistributions in binary form must reproduce the above copyright notice,
35+
this list of conditions and the following disclaimer in the documentation
36+
and/or other materials provided with the distribution.
37+
38+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
39+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
40+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
41+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
42+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
43+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
44+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
45+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
46+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
47+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
48+

MANIFEST.in

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include LICENSE
2+
include README.rst

README.md README.rst

+34-23
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Sklearn-pandas
33
==============
44

5-
This module provides a bridge between [Scikit-Learn](http://scikit-learn.org/stable/)'s machine learning methods and [pandas](http://pandas.pydata.org/)-style Data Frames.
5+
This module provides a bridge between `Scikit-Learn <http://scikit-learn.org/stable/>`__'s machine learning methods and `pandas <http://pandas.pydata.org/>`__-style Data Frames.
66

77
In particular, it provides:
88

@@ -12,40 +12,42 @@ In particular, it provides:
1212
Installation
1313
------------
1414

15-
You can install `sklearn-pandas` with `pip`.
15+
You can install ``sklearn-pandas`` with ``pip``::
1616

1717
# pip install sklearn-pandas
1818

1919
Tests
2020
-----
2121

22-
The examples in this file double as basic sanity tests. To run them, use `doctest`, which is included with python.
22+
The examples in this file double as basic sanity tests. To run them, use ``doctest``, which is included with python::
2323

2424
# python -m doctest README.md
2525

2626
Usage
2727
-----
2828

29-
### Import
29+
Import
30+
******
3031

31-
Import what you need from the `sklearn_pandas` package. The choices are:
32+
Import what you need from the ``sklearn_pandas`` package. The choices are:
3233

33-
* `DataFrameMapper`, a class for mapping pandas data frame columns to different sklearn transformations
34-
* `cross_val_score`, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
34+
* ``DataFrameMapper``, a class for mapping pandas data frame columns to different sklearn transformations
35+
* ``cross_val_score``, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
3536

36-
For this demonstration, we will import both.
37+
For this demonstration, we will import both::
3738

3839
>>> from sklearn_pandas import DataFrameMapper, cross_val_score
3940

40-
For these examples, we'll also use pandas and sklearn.
41+
For these examples, we'll also use pandas and sklearn::
4142

4243
>>> import pandas as pd
4344
>>> import sklearn.preprocessing, sklearn.decomposition, \
4445
... sklearn.linear_model, sklearn.pipeline, sklearn.metrics
4546

46-
### Load some Data
47+
Load some Data
48+
**************
4749

48-
Normally you'll read the data from a file, but for demonstration purposes I'll create a data frame from a Python dict.
50+
Normally you'll read the data from a file, but for demonstration purposes I'll create a data frame from a Python dict::
4951

5052
>>> data = pd.DataFrame({'pet': ['cat', 'dog', 'dog', 'fish', 'cat', 'dog', 'cat', 'fish'],
5153
... 'children': [4., 6, 3, 3, 2, 3, 5, 4],
@@ -54,19 +56,21 @@ Normally you'll read the data from a file, but for demonstration purposes I'll c
5456
Transformation Mapping
5557
----------------------
5658

57-
### Map the Columns to Transformations
59+
Map the Columns to Transformations
60+
**********************************
5861

59-
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column.
62+
The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column::
6063

6164
>>> mapper = DataFrameMapper([
6265
... ('pet', sklearn.preprocessing.LabelBinarizer()),
6366
... ('children', sklearn.preprocessing.StandardScaler())
6467
... ])
6568

6669

67-
### Test the Transformation
70+
Test the Transformation
71+
***********************
6872

69-
We can use the `fit_transform` shortcut to both fit the model and see what transformed data looks like.
73+
We can use the ``fit_transform`` shortcut to both fit the model and see what transformed data looks like::
7074

7175
>>> mapper.fit_transform(data)
7276
array([[ 1. , 0. , 0. , 0.20851441],
@@ -78,22 +82,23 @@ We can use the `fit_transform` shortcut to both fit the model and see what trans
7882
[ 1. , 0. , 0. , 1.04257207],
7983
[ 0. , 0. , 1. , 0.20851441]])
8084

81-
Note that the first three columns are the output of the `LabelBinarizer` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the `DataFrameMapper` is constructed.
85+
Note that the first three columns are the output of the ``LabelBinarizer`` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the ``DataFrameMapper`` is constructed.
8286

83-
Now that the transformation is trained, we confirm that it works on new data.
87+
Now that the transformation is trained, we confirm that it works on new data::
8488

8589
>>> mapper.transform({'pet': ['cat'], 'children': [5.]})
8690
array([[ 1. , 0. , 0. , 1.04257207]])
8791

88-
### Transform Multiple Columns
92+
Transform Multiple Columns
93+
**************************
8994

90-
Transformations may require multiple input columns. In these cases, the column names can be specified in a list.
95+
Transformations may require multiple input columns. In these cases, the column names can be specified in a list::
9196

9297
>>> mapper2 = DataFrameMapper([
9398
... (['children', 'salary'], sklearn.decomposition.PCA(1))
9499
... ])
95100
96-
Now running `fit_transform` will run PCA on the `children` and `salary` columns and return the first principal component.
101+
Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` columns and return the first principal component::
97102

98103
>>> mapper2.fit_transform(data)
99104
array([[ 47.62288153],
@@ -108,14 +113,20 @@ Now running `fit_transform` will run PCA on the `children` and `salary` columns
108113
Cross-Validation
109114
----------------
110115

111-
Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with `DataFrameMapper`.
116+
Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with ``DataFrameMapper``.
112117

113-
To get around this, sklearn-pandas provides a wrapper on sklearn's `cross_val_score` function which passes a pandas DataFrame to the estimator rather than a numpy array.
118+
To get around this, sklearn-pandas provides a wrapper on sklearn's ``cross_val_score`` function which passes a pandas DataFrame to the estimator rather than a numpy array::
114119

115120
>>> pipe = sklearn.pipeline.Pipeline([
116121
... ('featurize', mapper),
117122
... ('lm', sklearn.linear_model.LinearRegression())])
118123
>>> cross_val_score(pipe, data, data.salary, sklearn.metrics.mean_squared_error)
119124
array([ 2018.185 , 6.72033058, 1899.58333333])
120125

121-
Sklearn-pandas' `cross_val_score` function provides exactly the same interface as sklearn's function of the same name.
126+
Sklearn-pandas' ``cross_val_score`` function provides exactly the same interface as sklearn's function of the same name.
127+
128+
Credit
129+
------
130+
131+
The code for ``DataFrameMapper`` is based on code originally written by `Ben Hamner <https://github.com/benhamner>`__.
132+

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from setuptools import setup
44

55
setup(name='sklearn-pandas',
6-
version='0.2',
6+
version='0.0.1',
77
description='Pandas integration with sklearn',
88
author='Paul Butler',
99
author_email='[email protected]',

sklearn_pandas/__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import numpy as np
33
from sklearn.base import BaseEstimator, TransformerMixin
44
from sklearn import cross_validation
5-
import pdb
65

76
def cross_val_score(estimator, X, *args, **kwargs):
87
class DataFrameWrapper(object):

0 commit comments

Comments
 (0)