Skip to content

Commit bc94b25

Browse files
authored
Rework introduction.rst (#1110)
1 parent 03078d5 commit bc94b25

File tree

1 file changed

+34
-29
lines changed

1 file changed

+34
-29
lines changed

doc/introduction.rst

+34-29
Original file line numberDiff line numberDiff line change
@@ -9,41 +9,45 @@ Introduction
99
API's of imbalanced-learn samplers
1010
----------------------------------
1111

12-
The available samplers follows the scikit-learn API using the base estimator
13-
and adding a sampling functionality through the ``sample`` method:
12+
The available samplers follow the
13+
`scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics>`_
14+
using the base estimator
15+
and incorporating a sampling functionality via the ``sample`` method:
1416

1517
:Estimator:
1618

17-
The base object, implements a ``fit`` method to learn from data, either::
19+
The base object, implements a ``fit`` method to learn from data::
1820

1921
estimator = obj.fit(data, targets)
2022

2123
:Resampler:
2224

23-
To resample a data sets, each sampler implements::
25+
To resample a data sets, each sampler implements a ``fit_resample`` method::
2426

2527
data_resampled, targets_resampled = obj.fit_resample(data, targets)
2628

27-
Imbalanced-learn samplers accept the same inputs that in scikit-learn:
29+
Imbalanced-learn samplers accept the same inputs as scikit-learn estimators:
2830

29-
* `data`:
30-
* 2-D :class:`list`,
31-
* 2-D :class:`numpy.ndarray`,
32-
* :class:`pandas.DataFrame`,
33-
* :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
34-
* `targets`:
35-
* 1-D :class:`numpy.ndarray`,
36-
* :class:`pandas.Series`.
31+
* `data`, 2-dimensional array-like structures, such as:
32+
* Python's list of lists :class:`list`,
33+
* Numpy arrays :class:`numpy.ndarray`,
34+
* Panda dataframes :class:`pandas.DataFrame`,
35+
* Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
36+
37+
* `targets`, 1-dimensional array-like structures, such as:
38+
* Numpy arrays :class:`numpy.ndarray`,
39+
* Pandas series :class:`pandas.Series`.
3740

3841
The output will be of the following type:
3942

40-
* `data_resampled`:
41-
* 2-D :class:`numpy.ndarray`,
42-
* :class:`pandas.DataFrame`,
43-
* :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
44-
* `targets_resampled`:
45-
* 1-D :class:`numpy.ndarray`,
46-
* :class:`pandas.Series`.
43+
* `data_resampled`, 2-dimensional aray-like structures, such as:
44+
* Numpy arrays :class:`numpy.ndarray`,
45+
* Pandas dataframes :class:`pandas.DataFrame`,
46+
* Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
47+
48+
* `targets_resampled`, 1-dimensional array-like structures, such as:
49+
* Numpy arrays :class:`numpy.ndarray`,
50+
* Pandas series :class:`pandas.Series`.
4751

4852
.. topic:: Pandas in/out
4953

@@ -62,18 +66,19 @@ The output will be of the following type:
6266
Problem statement regarding imbalanced data sets
6367
------------------------------------------------
6468

65-
The learning phase and the subsequent prediction of machine learning algorithms
66-
can be affected by the problem of imbalanced data set. The balancing issue
67-
corresponds to the difference of the number of samples in the different
68-
classes. We illustrate the effect of training a linear SVM classifier with
69-
different levels of class balancing.
69+
The learning and prediction phrases of machine learning algorithms
70+
can be impacted by the issue of **imbalanced datasets**. This imbalance
71+
refers to the difference in the number of samples across different classes.
72+
We demonstrate the effect of training a `Logistic Regression classifier
73+
<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_
74+
with varying levels of class balancing by adjusting their weights.
7075

7176
.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
7277
:target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
7378
:scale: 60
7479
:align: center
7580

76-
As expected, the decision function of the linear SVM varies greatly depending
77-
upon how imbalanced the data is. With a greater imbalanced ratio, the decision
78-
function favors the class with the larger number of samples, usually referred
79-
as the majority class.
81+
As expected, the decision function of the Logistic Regression classifier varies significantly
82+
depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
83+
tends to favour the class with the larger number of samples, usually referred to as the
84+
**majority class**.

0 commit comments

Comments
 (0)