Skip to content

Commit fa3b726

Browse files
Python syntax highlighting for readme (scikit-learn-contrib#255)
1 parent 2d488f2 commit fa3b726

File tree

1 file changed

+16
-8
lines changed

1 file changed

+16
-8
lines changed

README.rst

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Sklearn-pandas
99
.. image:: https://anaconda.org/conda-forge/sklearn-pandas/badges/version.svg
1010
:target: https://anaconda.org/conda-forge/sklearn-pandas/
1111

12+
.. highlight:: python
13+
1214
This module provides a bridge between `Scikit-Learn <http://scikit-learn.org/stable>`__'s machine learning methods and `pandas <https://pandas.pydata.org>`__-style Data Frames.
1315
In particular, it provides a way to map ``DataFrame`` columns to transformations, which are later recombined into features.
1416

@@ -89,7 +91,7 @@ Let's see an example::
8991

9092
The difference between specifying the column selector as ``'column'`` (as a simple string) and ``['column']`` (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
9193

92-
This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` indexing:
94+
This behaviour mimics the same pattern as pandas' dataframes ``__getitem__`` indexing::
9395

9496
>>> data['children'].shape
9597
(8,)
@@ -164,8 +166,9 @@ Alternatively, you can also specify prefix and/or suffix to add to the column na
164166

165167
Dynamic Columns
166168
***********************
167-
In some situations the columns are not known before hand and we would like to dynamically select them during the fit operation. As shown below, in such situations you can provide either a custom callable or use `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`__.
169+
In some situations the columns are not known before hand and we would like to dynamically select them during the fit operation. As shown below, in such situations you can provide either a custom callable or use `make_column_selector <https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html>`__.
168170

171+
::
169172

170173
>>> class GetColumnsStartingWith:
171174
... def __init__(self, start_str):
@@ -273,14 +276,14 @@ Dropping columns explictly
273276

274277
Sometimes it is required to drop a specific column/ list of columns.
275278
For this purpose, ``drop_cols`` argument for ``DataFrameMapper`` can be used.
276-
Default value is ``None``
279+
Default value is ``None``::
277280

278281
>>> mapper_df = DataFrameMapper([
279282
... ('pet', sklearn.preprocessing.LabelBinarizer()),
280283
... (['children'], sklearn.preprocessing.StandardScaler())
281284
... ], drop_cols=['salary'])
282285

283-
Now running ``fit_transform`` will run transformations on 'pet' and 'children' and drop 'salary' column:
286+
Now running ``fit_transform`` will run transformations on 'pet' and 'children' and drop 'salary' column::
284287

285288
>>> np.round(mapper_df.fit_transform(data.copy()), 1)
286289
array([[ 1. , 0. , 0. , 0.2],
@@ -355,7 +358,7 @@ Applying a default transformer
355358
******************************
356359

357360
A default transformer can be applied to columns not explicitly selected
358-
passing it as the ``default`` argument to the mapper:
361+
passing it as the ``default`` argument to the mapper::
359362

360363
>>> mapper4 = DataFrameMapper([
361364
... ('pet', sklearn.preprocessing.LabelBinarizer()),
@@ -385,7 +388,7 @@ acceptable by ``DataFrameMapper``.
385388

386389
For example, consider a dataset with three categorical columns, 'col1', 'col2', and 'col3',
387390
To binarize each of them, one could pass column names and ``LabelBinarizer`` transformer class
388-
into generator, and then use returned definition as ``features`` argument for ``DataFrameMapper``:
391+
into generator, and then use returned definition as ``features`` argument for ``DataFrameMapper``::
389392

390393
>>> from sklearn_pandas import gen_features
391394
>>> feature_def = gen_features(
@@ -407,7 +410,7 @@ into generator, and then use returned definition as ``features`` argument for ``
407410

408411
If it is required to override some of transformer parameters, then a dict with 'class' key and
409412
transformer parameters should be provided. For example, consider a dataset with missing values.
410-
Then the following code could be used to override default imputing strategy:
413+
Then the following code could be used to override default imputing strategy::
411414

412415
>>> from sklearn.impute import SimpleImputer
413416
>>> import numpy as np
@@ -451,6 +454,8 @@ Feature selection and other supervised transformations
451454

452455
``DataFrameMapper`` supports transformers that require both X and y arguments. An example of this is feature selection. Treating the 'pet' column as the target, we will select the column that best predicts it.
453456

457+
::
458+
454459
>>> from sklearn.feature_selection import SelectKBest, chi2
455460
>>> mapper_fs = DataFrameMapper([(['children','salary'], SelectKBest(chi2, k=1))])
456461
>>> mapper_fs.fit_transform(data[['children','salary']], data['pet'])
@@ -467,7 +472,7 @@ Working with sparse features
467472
****************************
468473

469474
A ``DataFrameMapper`` will return a dense feature array by default. Setting ``sparse=True`` in the mapper will return
470-
a sparse array whenever any of the extracted features is sparse. Example:
475+
a sparse array whenever any of the extracted features is sparse. Example::
471476

472477
>>> mapper5 = DataFrameMapper([
473478
... ('pet', CountVectorizer()),
@@ -485,6 +490,8 @@ While you can use ``FunctionTransformation`` to generate arbitrary transformers,
485490
when pickling. Use ``NumericalTransformer`` instead, which takes the function name as a string parameter and hence
486491
can be easily serialized.
487492

493+
::
494+
488495
>>> from sklearn_pandas import NumericalTransformer
489496
>>> mapper5 = DataFrameMapper([
490497
... ('children', NumericalTransformer('log')),
@@ -505,6 +512,7 @@ Changing Logging level
505512
You can change log level to info to print time take to fit/transform features. Setting it to higher level will stop printing elapsed time.
506513
Below example shows how to change logging level.
507514

515+
::
508516

509517
>>> import logging
510518
>>> logging.getLogger('sklearn_pandas').setLevel(logging.INFO)

0 commit comments

Comments
 (0)