@@ -9,41 +9,45 @@ Introduction
9
9
API's of imbalanced-learn samplers
10
10
----------------------------------
11
11
12
- The available samplers follows the scikit-learn API using the base estimator
13
- and adding a sampling functionality through the ``sample `` method:
12
+ The available samplers follow the
13
+ `scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics >`_
14
+ using the base estimator
15
+ and incorporating a sampling functionality via the ``sample `` method:
14
16
15
17
:Estimator:
16
18
17
- The base object, implements a ``fit `` method to learn from data, either ::
19
+ The base object, implements a ``fit `` method to learn from data::
18
20
19
21
estimator = obj.fit(data, targets)
20
22
21
23
:Resampler:
22
24
23
- To resample a data sets, each sampler implements::
25
+ To resample a data sets, each sampler implements a `` fit_resample `` method ::
24
26
25
27
data_resampled, targets_resampled = obj.fit_resample(data, targets)
26
28
27
- Imbalanced-learn samplers accept the same inputs that in scikit-learn:
29
+ Imbalanced-learn samplers accept the same inputs as scikit-learn estimators :
28
30
29
- * `data `:
30
- * 2-D :class: `list `,
31
- * 2-D :class: `numpy.ndarray `,
32
- * :class: `pandas.DataFrame `,
33
- * :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
34
- * `targets `:
35
- * 1-D :class: `numpy.ndarray `,
36
- * :class: `pandas.Series `.
31
+ * `data `, 2-dimensional array-like structures, such as:
32
+ * Python's list of lists :class: `list `,
33
+ * Numpy arrays :class: `numpy.ndarray `,
34
+ * Panda dataframes :class: `pandas.DataFrame `,
35
+ * Scipy sparse matrices :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
36
+
37
+ * `targets `, 1-dimensional array-like structures, such as:
38
+ * Numpy arrays :class: `numpy.ndarray `,
39
+ * Pandas series :class: `pandas.Series `.
37
40
38
41
The output will be of the following type:
39
42
40
- * `data_resampled `:
41
- * 2-D :class: `numpy.ndarray `,
42
- * :class: `pandas.DataFrame `,
43
- * :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
44
- * `targets_resampled `:
45
- * 1-D :class: `numpy.ndarray `,
46
- * :class: `pandas.Series `.
43
+ * `data_resampled `, 2-dimensional aray-like structures, such as:
44
+ * Numpy arrays :class: `numpy.ndarray `,
45
+ * Pandas dataframes :class: `pandas.DataFrame `,
46
+ * Scipy sparse matrices :class: `scipy.sparse.csr_matrix ` or :class: `scipy.sparse.csc_matrix `;
47
+
48
+ * `targets_resampled `, 1-dimensional array-like structures, such as:
49
+ * Numpy arrays :class: `numpy.ndarray `,
50
+ * Pandas series :class: `pandas.Series `.
47
51
48
52
.. topic :: Pandas in/out
49
53
@@ -62,18 +66,19 @@ The output will be of the following type:
62
66
Problem statement regarding imbalanced data sets
63
67
------------------------------------------------
64
68
65
- The learning phase and the subsequent prediction of machine learning algorithms
66
- can be affected by the problem of imbalanced data set. The balancing issue
67
- corresponds to the difference of the number of samples in the different
68
- classes. We illustrate the effect of training a linear SVM classifier with
69
- different levels of class balancing.
69
+ The learning and prediction phrases of machine learning algorithms
70
+ can be impacted by the issue of **imbalanced datasets **. This imbalance
71
+ refers to the difference in the number of samples across different classes.
72
+ We demonstrate the effect of training a `Logistic Regression classifier
73
+ <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html> `_
74
+ with varying levels of class balancing by adjusting their weights.
70
75
71
76
.. image :: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
72
77
:target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
73
78
:scale: 60
74
79
:align: center
75
80
76
- As expected, the decision function of the linear SVM varies greatly depending
77
- upon how imbalanced the data is. With a greater imbalanced ratio, the decision
78
- function favors the class with the larger number of samples, usually referred
79
- as the majority class.
81
+ As expected, the decision function of the Logistic Regression classifier varies significantly
82
+ depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
83
+ tends to favour the class with the larger number of samples, usually referred to as the
84
+ ** majority class ** .
0 commit comments