Skip to content

Commit 3006a73

Browse files
committed
DOC make documentation consistent with sklearn guideline
1 parent 792e69d commit 3006a73

31 files changed

+231
-262
lines changed

doc/ensemble.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Forest of randomized trees
6969
:class:`BalancedRandomForestClassifier` is another ensemble method in which
7070
each tree of the forest will be provided a balanced bootstrap sample
7171
:cite:`chen2004using`. This class provides all functionality of the
72-
:class:`sklearn.ensemble.RandomForestClassifier` and notably the
72+
:class:`~sklearn.ensemble.RandomForestClassifier` and notably the
7373
`feature_importances_` attributes::
7474

7575
>>> from imblearn.ensemble import BalancedRandomForestClassifier

doc/miscellaneous.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ sampling for regression targets.
5353

5454
We illustrate the use of such sampler to implement an outlier rejection
5555
estimator which can be easily used within a
56-
:class:`imblearn.pipeline.Pipeline`:
56+
:class:`~imblearn.pipeline.Pipeline`:
5757
:ref:`sphx_glr_auto_examples_plot_outlier_rejections.py`
5858

5959
.. _generators:
@@ -158,7 +158,7 @@ Then, ``fit_generator`` can be called passing the generator and the step::
158158
... epochs=10, verbose=0)
159159

160160
The second possibility is to use
161-
:class:`imblearn.keras.BalancedBatchGenerator`. Only an instance of this class
161+
:class:`~imblearn.keras.BalancedBatchGenerator`. Only an instance of this class
162162
will be passed to ``fit_generator``::
163163

164164
>>> from imblearn.keras import BalancedBatchGenerator

doc/over_sampling.rst

+4-6
Original file line numberDiff line numberDiff line change
@@ -203,18 +203,16 @@ other extra interpolation.
203203
ROSE (Random Over-Sampling Examples)
204204
------------------------------------
205205

206-
ROSE uses smoothed bootstrapping to draw artificial samples from the
206+
ROSE uses smoothed bootstrapping to draw artificial samples from the
207207
feature space neighborhood around selected classes, using a multivariate
208-
Gaussian kernel around randomly selected samples. First, random samples are
208+
Gaussian kernel around randomly selected samples. First, random samples are
209209
selected from original classes. Then the smoothing kernel distribution
210-
is computed around the samples: :math:`\hat f(x|y=Y_i) = \sum_i^{n_j}
211-
p_i Pr(x|x_i)=\sum_i^{n_j} \frac{1}{n_j} Pr(x|x_i)=\sum_i^{n_j}
210+
is computed around the samples: :math:`\hat f(x|y=Y_i) = \sum_i^{n_j}
211+
p_i Pr(x|x_i)=\sum_i^{n_j} \frac{1}{n_j} Pr(x|x_i)=\sum_i^{n_j}
212212
\frac{1}{n_j} K_{H_j}(x|x_i)`.
213213

214214
Then new samples are drawn from the computed distribution.
215215

216-
217-
218216
Mathematical formulation
219217
========================
220218

examples/applications/porto_seguro_keras_under_sampling.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ def convert_float64(X):
6363
###############################################################################
6464
# We want to standard scale the numerical features while we want to one-hot
6565
# encode the categorical features. In this regard, we make use of the
66-
# :class:`sklearn.compose.ColumnTransformer`.
66+
# :class:`~sklearn.compose.ColumnTransformer`.
6767

6868
numerical_columns = [name for name in X_train.columns
6969
if '_calc_' in name and '_bin' not in name]

examples/plot_outlier_rejections.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -73,14 +73,14 @@ def plot_scatter(X, y, title):
7373
plot_scatter(X_test, y_test, 'Testing dataset')
7474

7575
##############################################################################
76-
# How to use the :class:`imblearn.FunctionSampler`
76+
# How to use the :class:`~imblearn.FunctionSampler`
7777
##############################################################################
7878

7979
##############################################################################
8080
# We first define a function which will use
81-
# :class:`sklearn.ensemble.IsolationForest` to eliminate some outliers from
81+
# :class:`~sklearn.ensemble.IsolationForest` to eliminate some outliers from
8282
# our dataset during training. The function passed to the
83-
# :class:`imblearn.FunctionSampler` will be called when using the method
83+
# :class:`~imblearn.FunctionSampler` will be called when using the method
8484
# ``fit_resample``.
8585

8686

imblearn/combine/_smote_enn.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,15 @@ class SMOTEENN(BaseSampler):
3636
3737
{random_state}
3838
39-
smote : object, default=None
40-
The :class:`imblearn.over_sampling.SMOTE` object to use. If not given,
41-
a :class:`imblearn.over_sampling.SMOTE` object with default parameters
39+
smote : sampler object, default=None
40+
The :class:`~imblearn.over_sampling.SMOTE` object to use. If not given,
41+
a :class:`~imblearn.over_sampling.SMOTE` object with default parameters
4242
will be given.
4343
44-
enn : object, default=None
45-
The :class:`imblearn.under_sampling.EditedNearestNeighbours` object
44+
enn : sampler object, default=None
45+
The :class:`~imblearn.under_sampling.EditedNearestNeighbours` object
4646
to use. If not given, a
47-
:class:`imblearn.under_sampling.EditedNearestNeighbours` object with
47+
:class:`~imblearn.under_sampling.EditedNearestNeighbours` object with
4848
sampling strategy='all' will be given.
4949
5050
{n_jobs}

imblearn/combine/_smote_tomek.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,14 @@ class SMOTETomek(BaseSampler):
3737
3838
{random_state}
3939
40-
smote : object, default=None
41-
The :class:`imblearn.over_sampling.SMOTE` object to use. If not given,
42-
a :class:`imblearn.over_sampling.SMOTE` object with default parameters
40+
smote : sampler object, default=None
41+
The :class:`~imblearn.over_sampling.SMOTE` object to use. If not given,
42+
a :class:`~imblearn.over_sampling.SMOTE` object with default parameters
4343
will be given.
4444
45-
tomek : object, default=None
46-
The :class:`imblearn.under_sampling.TomekLinks` object to use. If not
47-
given, a :class:`imblearn.under_sampling.TomekLinks` object with
45+
tomek : sampler object, default=None
46+
The :class:`~imblearn.under_sampling.TomekLinks` object to use. If not
47+
given, a :class:`~imblearn.under_sampling.TomekLinks` object with
4848
sampling strategy='all' will be given.
4949
5050
{n_jobs}

imblearn/datasets/_imbalance.py

+8-9
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ def make_imbalance(
2626
2727
Parameters
2828
----------
29-
X : {array-like, dataframe}, shape (n_samples, n_features)
29+
X : {array-like, dataframe} of shape (n_samples, n_features)
3030
Matrix containing the data to be imbalanced.
3131
32-
y : ndarray, shape (n_samples, )
32+
y : ndarray of shape (n_samples,)
3333
Corresponding label for each sample in X.
3434
35-
sampling_strategy : dict, or callable,
35+
sampling_strategy : dict or callable,
3636
Ratio to use for resampling the data set.
3737
3838
- When ``dict``, the keys correspond to the targeted classes. The
@@ -43,25 +43,25 @@ def make_imbalance(
4343
correspond to the targeted classes. The values correspond to the
4444
desired number of samples for each class.
4545
46-
random_state : int, RandomState instance or None, optional (default=None)
46+
random_state : int, RandomState instance or None, default=None
4747
If int, random_state is the seed used by the random number generator;
4848
If RandomState instance, random_state is the random number generator;
4949
If None, the random number generator is the RandomState instance used
5050
by np.random.
5151
52-
verbose : bool, optional (default=False)
52+
verbose : bool, default=False
5353
Show information regarding the sampling.
5454
55-
kwargs : dict, optional
55+
kwargs : dict
5656
Dictionary of additional keyword arguments to pass to
5757
``sampling_strategy``.
5858
5959
Returns
6060
-------
61-
X_resampled : {ndarray, dataframe}, shape (n_samples_new, n_features)
61+
X_resampled : {ndarray, dataframe} of shape (n_samples_new, n_features)
6262
The array containing the imbalanced data.
6363
64-
y_resampled : ndarray, shape (n_samples_new)
64+
y_resampled : ndarray of shape (n_samples_new)
6565
The corresponding label of `X_resampled`
6666
6767
Notes
@@ -86,7 +86,6 @@ def make_imbalance(
8686
... random_state=42)
8787
>>> print('Distribution after imbalancing: {}'.format(Counter(y_res)))
8888
Distribution after imbalancing: Counter({2: 30, 1: 20, 0: 10})
89-
9089
"""
9190
target_stats = Counter(y)
9291
# restrict ratio to be a dict or a callable

imblearn/datasets/_zenodo.py

+10-10
Original file line numberDiff line numberDiff line change
@@ -117,42 +117,42 @@ def fetch_datasets(
117117
118118
Parameters
119119
----------
120-
data_home : string, optional (default=None)
120+
data_home : str, default=None
121121
Specify another download and cache folder for the datasets. By default
122122
all scikit-learn data is stored in '~/scikit_learn_data' subfolders.
123123
124-
filter_data : tuple of str/int or None, optional (default=None)
124+
filter_data : tuple of str/int, default=None
125125
A tuple containing the ID or the name of the datasets to be returned.
126126
Refer to the above table to get the ID and name of the datasets.
127127
128-
download_if_missing : boolean, optional (default=True)
128+
download_if_missing : bool, default=True
129129
If False, raise a IOError if the data is not locally available
130130
instead of trying to download the data from the source site.
131131
132-
random_state : int, RandomState instance or None, optional (default=None)
132+
random_state : int, RandomState instance or None, default=None
133133
Random state for shuffling the dataset.
134134
If int, random_state is the seed used by the random number generator;
135135
If RandomState instance, random_state is the random number generator;
136136
If None, the random number generator is the RandomState instance used
137137
by `np.random`.
138138
139-
shuffle : bool, optional (default=False)
139+
shuffle : bool, default=False
140140
Whether to shuffle dataset.
141141
142-
verbose : bool, optional (default=False)
142+
verbose : bool, default=False
143143
Show information regarding the fetching.
144144
145145
Returns
146146
-------
147147
datasets : OrderedDict of Bunch object,
148148
The ordered is defined by ``filter_data``. Each Bunch object ---
149-
refered as dataset --- have the following attributes:
149+
referred as dataset --- have the following attributes:
150150
151-
dataset.data : ndarray, shape (n_samples, n_features)
151+
dataset.data : ndarray of shape (n_samples, n_features)
152152
153-
dataset.target : ndarray, shape (n_samples, )
153+
dataset.target : ndarray of shape (n_samples,)
154154
155-
dataset.DESCR : string
155+
dataset.DESCR : str
156156
Description of the each dataset.
157157
158158
Notes

imblearn/ensemble/_bagging.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class BalancedBaggingClassifier(BaggingClassifier):
3737
3838
Parameters
3939
----------
40-
base_estimator : object, default=None
40+
base_estimator : estimator object, default=None
4141
The base estimator to fit on random subsets of the dataset.
4242
If None, then the base estimator is a decision tree.
4343
@@ -130,7 +130,7 @@ class BalancedBaggingClassifier(BaggingClassifier):
130130
Notes
131131
-----
132132
This is possible to turn this classifier into a balanced random forest [5]_
133-
by passing a :class:`sklearn.tree.DecisionTreeClassifier` with
133+
by passing a :class:`~sklearn.tree.DecisionTreeClassifier` with
134134
`max_features='auto'` as a base estimator.
135135
136136
See
@@ -157,7 +157,6 @@ class BalancedBaggingClassifier(BaggingClassifier):
157157
158158
Examples
159159
--------
160-
161160
>>> from collections import Counter
162161
>>> from sklearn.datasets import make_classification
163162
>>> from sklearn.model_selection import train_test_split

imblearn/ensemble/_easy_ensemble.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ class EasyEnsembleClassifier(BaggingClassifier):
4242
n_estimators : int, default=10
4343
Number of AdaBoost learners in the ensemble.
4444
45-
base_estimator : object, default=AdaBoostClassifier()
45+
base_estimator : estimator object, default=AdaBoostClassifier()
4646
The base AdaBoost classifier used in the inner ensemble. Note that you
4747
can set the number of inner learner by passing your own instance.
4848
@@ -60,7 +60,7 @@ class EasyEnsembleClassifier(BaggingClassifier):
6060
6161
{random_state}
6262
63-
verbose : int, optional (default=0)
63+
verbose : int, default=0
6464
Controls the verbosity of the building process.
6565
6666
Attributes
@@ -103,7 +103,6 @@ class EasyEnsembleClassifier(BaggingClassifier):
103103
104104
Examples
105105
--------
106-
107106
>>> from collections import Counter
108107
>>> from sklearn.datasets import make_classification
109108
>>> from sklearn.model_selection import train_test_split

imblearn/ensemble/_forest.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ class BalancedRandomForestClassifier(RandomForestClassifier):
9191
n_estimators : int, default=100
9292
The number of trees in the forest.
9393
94-
criterion : str, default="gini"
94+
criterion : {"gini", "entropy"}, default="gini"
9595
The function to measure the quality of a split. Supported criteria are
9696
"gini" for the Gini impurity and "entropy" for the information gain.
9797
Note: this parameter is tree-specific.
@@ -101,15 +101,15 @@ class BalancedRandomForestClassifier(RandomForestClassifier):
101101
all leaves are pure or until all leaves contain less than
102102
min_samples_split samples.
103103
104-
min_samples_split : int, float, default=2
104+
min_samples_split : int or float, default=2
105105
The minimum number of samples required to split an internal node:
106106
107107
- If int, then consider `min_samples_split` as the minimum number.
108108
- If float, then `min_samples_split` is a percentage and
109109
`ceil(min_samples_split * n_samples)` are the minimum
110110
number of samples for each split.
111111
112-
min_samples_leaf : int, float, default=1
112+
min_samples_leaf : int or float, default=1
113113
The minimum number of samples required to be at a leaf node:
114114
115115
- If int, then consider ``min_samples_leaf`` as the minimum number.

imblearn/ensemble/_weight_boosting.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class RUSBoostClassifier(AdaBoostClassifier):
2929
3030
Parameters
3131
----------
32-
base_estimator : object, default=None
32+
base_estimator : estimator object, default=None
3333
The base estimator from which the boosted ensemble is built.
3434
Support for sample weighting is required, as well as proper
3535
``classes_`` and ``n_classes_`` attributes. If ``None``, then

0 commit comments

Comments
 (0)