Skip to content

Commit f45c0c0

Browse files
authored
DOC phrasing improvements and typos (scikit-learn#16744)
1 parent a655de5 commit f45c0c0

File tree

3 files changed

+36
-35
lines changed

3 files changed

+36
-35
lines changed

doc/about.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ this project as part of his thesis.
1313
In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent
1414
Michel of INRIA took leadership of the project and made the first public
1515
release, February the 1st 2010. Since then, several releases have appeared
16-
following a ~3 month cycle, and a thriving international community has
16+
following a ~ 3-month cycle, and a thriving international community has
1717
been leading the development.
1818

1919
Governance
@@ -520,7 +520,7 @@ budget of the project [#f1]_.
520520
521521
.. rubric:: Notes
522522

523-
.. [#f1] Regarding the organization budget in particular, we might use some of
523+
.. [#f1] Regarding the organization budget, in particular, we might use some of
524524
the donated funds to pay for other project expenses such as DNS,
525525
hosting or continuous integration services.
526526

doc/faq.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ What are the inclusion criteria for new algorithms ?
9797
----------------------------------------------------
9898

9999
We only consider well-established algorithms for inclusion. A rule of thumb is
100-
at least 3 years since publication, 200+ citations and wide use and
100+
at least 3 years since publication, 200+ citations, and wide use and
101101
usefulness. A technique that provides a clear-cut improvement (e.g. an
102102
enhanced data structure or a more efficient approximation technique) on
103103
a widely-used method will also be considered for inclusion.
@@ -123,7 +123,7 @@ Inclusion of a new algorithm speeding up an existing model is easier if:
123123
n_samples",
124124
- benchmarks clearly show a speed up.
125125

126-
Also note that your implementation need not be in scikit-learn to be used
126+
Also, note that your implementation need not be in scikit-learn to be used
127127
together with scikit-learn tools. You can implement your favorite algorithm
128128
in a scikit-learn compatible way, upload it to GitHub and let us know. We
129129
will be happy to list it under :ref:`related_projects`. If you already have
@@ -135,7 +135,7 @@ interested to look at `scikit-learn-contrib
135135

136136
Why are you so selective on what algorithms you include in scikit-learn?
137137
------------------------------------------------------------------------
138-
Code is maintenance cost, and we need to balance the amount of
138+
Code comes with maintenance cost, and we need to balance the amount of
139139
code we have with the size of the team (and add to this the fact that
140140
complexity scales non linearly with the number of features).
141141
The package relies on core developers using their free time to
@@ -250,7 +250,7 @@ Why do I sometime get a crash/freeze with n_jobs > 1 under OSX or Linux?
250250

251251
Several scikit-learn tools such as ``GridSearchCV`` and ``cross_val_score``
252252
rely internally on Python's `multiprocessing` module to parallelize execution
253-
onto several Python processes by passing ``n_jobs > 1`` as argument.
253+
onto several Python processes by passing ``n_jobs > 1`` as an argument.
254254

255255
The problem is that Python ``multiprocessing`` does a ``fork`` system call
256256
without following it with an ``exec`` system call for performance reasons. Many

doc/glossary.rst

+30-29
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ General Concepts
4141
contributor documentation <api_overview>`.
4242

4343
The specific interfaces that constitute Scikit-learn's public API are
44-
largely documented in :ref:`api_ref`. However we less formally consider
44+
largely documented in :ref:`api_ref`. However, we less formally consider
4545
anything as public API if none of the identifiers required to access it
4646
begins with ``_``. We generally try to maintain :term:`backwards
4747
compatibility` for all objects in the public API.
@@ -106,12 +106,12 @@ General Concepts
106106
are documented under an estimator's *Parameters* documentation.
107107

108108
backwards compatibility
109-
We generally try to maintain backwards compatibility (i.e. interfaces
109+
We generally try to maintain backward compatibility (i.e. interfaces
110110
and behaviors may be extended but not changed or removed) from release
111111
to release but this comes with some exceptions:
112112

113113
Public API only
114-
The behaviour of objects accessed through private identifiers
114+
The behavior of objects accessed through private identifiers
115115
(those beginning ``_``) may be changed arbitrarily between
116116
versions.
117117
As documented
@@ -145,8 +145,8 @@ General Concepts
145145
assumed but not formally tested.
146146

147147
Despite this informal contract with our users, the software is provided
148-
as is, as stated in the licence. When a release inadvertently
149-
introduces changes that are not backwards compatible, these are known
148+
as is, as stated in the license. When a release inadvertently
149+
introduces changes that are not backward compatible, these are known
150150
as software regressions.
151151

152152
callable
@@ -647,7 +647,7 @@ General Concepts
647647
first axis and a fixed, finite set of :term:`features` on the second
648648
is called rectangular.
649649

650-
This term excludes samples with non-vectorial structure, such as text,
650+
This term excludes samples with non-vectorial structures, such as text,
651651
an image of arbitrary size, a time series of arbitrary length, a set of
652652
vectors, etc. The purpose of a :term:`vectorizer` is to produce
653653
rectangular forms of such data.
@@ -684,7 +684,7 @@ General Concepts
684684
versions happen via a :ref:`SLEP <slep>` and follows the
685685
decision-making process outlined in :ref:`governance`.
686686
For all votes, a proposal must have been made public and discussed before the
687-
vote. Such proposal must be a consolidated document, in the form of a
687+
vote. Such a proposal must be a consolidated document, in the form of a
688688
‘Scikit-Learn Enhancement Proposal’ (SLEP), rather than a long discussion on an
689689
issue. A SLEP must be submitted as a pull-request to
690690
`enhancement proposals <https://scikit-learn-enhancement-proposals.readthedocs.io>`_ using the
@@ -881,12 +881,12 @@ Class APIs and Estimator Types
881881
In a meta-estimator's :term:`fit` method, any contained estimators
882882
should be :term:`cloned` before they are fit (although FIXME: Pipeline
883883
and FeatureUnion do not do this currently). An exception to this is
884-
that an estimator may explicitly document that it accepts a prefitted
884+
that an estimator may explicitly document that it accepts a pre-fitted
885885
estimator (e.g. using ``prefit=True`` in
886886
:class:`feature_selection.SelectFromModel`). One known issue with this
887-
is that the prefitted estimator will lose its model if the
887+
is that the pre-fitted estimator will lose its model if the
888888
meta-estimator is cloned. A meta-estimator should have ``fit`` called
889-
before prediction, even if all contained estimators are prefitted.
889+
before prediction, even if all contained estimators are pre-fitted.
890890

891891
In cases where a meta-estimator's primary behaviors (e.g.
892892
:term:`predict` or :term:`transform` implementation) are functions of
@@ -1008,7 +1008,7 @@ Target Types
10081008

10091009
binary
10101010
A classification problem consisting of two classes. A binary target
1011-
may represented as for a :term:`multiclass` problem but with only two
1011+
may be represented as for a :term:`multiclass` problem but with only two
10121012
labels. A binary decision function is represented as a 1d array.
10131013

10141014
Semantically, one class is often considered the "positive" class.
@@ -1028,7 +1028,7 @@ Target Types
10281028

10291029
continuous
10301030
A regression problem where each sample's target is a finite floating
1031-
point number, represented as a 1-dimensional array of floats (or
1031+
point number represented as a 1-dimensional array of floats (or
10321032
sometimes ints).
10331033

10341034
:func:`~utils.multiclass.type_of_target` will return 'continuous' for
@@ -1078,7 +1078,7 @@ Target Types
10781078
A classification problem where each sample's target consists of
10791079
``n_outputs`` :term:`outputs`, each a class label, for a fixed int
10801080
``n_outputs > 1`` in a particular dataset. Each output has a
1081-
fixed set of available classes, and each sample is labelled with a
1081+
fixed set of available classes, and each sample is labeled with a
10821082
class for each output. An output may be binary or multiclass, and in
10831083
the case where all outputs are binary, the target is
10841084
:term:`multilabel`.
@@ -1213,10 +1213,10 @@ Methods
12131213
and ``transform`` separately would be less efficient than together.
12141214
:class:`base.TransformerMixin` provides a default implementation,
12151215
providing a consistent interface across transformers where
1216-
``fit_transform`` is or is not specialised.
1216+
``fit_transform`` is or is not specialized.
12171217

12181218
In :term:`inductive` learning -- where the goal is to learn a
1219-
generalised model that can be applied to new data -- users should be
1219+
generalized model that can be applied to new data -- users should be
12201220
careful not to apply ``fit_transform`` to the entirety of a dataset
12211221
(i.e. training and test data together) before further modelling, as
12221222
this results in :term:`data leakage`.
@@ -1225,7 +1225,7 @@ Methods
12251225
Primarily for :term:`feature extractors`, but also used for other
12261226
transformers to provide string names for each column in the output of
12271227
the estimator's :term:`transform` method. It outputs a list of
1228-
strings, and may take a list of strings as input, corresponding
1228+
strings and may take a list of strings as input, corresponding
12291229
to the names of input columns from which output column names can
12301230
be generated. By default input features are named x0, x1, ....
12311231

@@ -1250,7 +1250,7 @@ Methods
12501250
``partial_fit``
12511251
Facilitates fitting an estimator in an online fashion. Unlike ``fit``,
12521252
repeatedly calling ``partial_fit`` does not clear the model, but
1253-
updates it with respect to the data provided. The portion of data
1253+
updates it with the data provided. The portion of data
12541254
provided to ``partial_fit`` may be called a mini-batch.
12551255
Each mini-batch must be of consistent shape, etc. In iterative
12561256
estimators, ``partial_fit`` often only performs a single iteration.
@@ -1322,7 +1322,7 @@ Methods
13221322
to facilitate numerical stability.
13231323

13241324
``predict_proba``
1325-
A method in :term:`classifiers` and :term:`clusterers` that are able to
1325+
A method in :term:`classifiers` and :term:`clusterers` that can
13261326
return probability estimates for each class/cluster. Its input is
13271327
usually only some observed data, :term:`X`.
13281328

@@ -1381,7 +1381,7 @@ Methods
13811381
In a :term:`transformer`, transforms the input, usually only :term:`X`,
13821382
into some transformed space (conventionally notated as :term:`Xt`).
13831383
Output is an array or sparse matrix of length :term:`n_samples` and
1384-
with number of columns fixed after :term:`fitting`.
1384+
with the number of columns fixed after :term:`fitting`.
13851385

13861386
If the estimator was not already :term:`fitted`, calling this method
13871387
should raise a :class:`exceptions.NotFittedError`.
@@ -1405,8 +1405,8 @@ functions or non-estimator constructors.
14051405
:term:`multioutput` (including :term:`multilabel`) tasks, the weights
14061406
are multiplied across outputs (i.e. columns of ``y``).
14071407

1408-
By default all samples have equal weight such that classes are
1409-
effectively weighted by their their prevalence in the training data.
1408+
By default, all samples have equal weight such that classes are
1409+
effectively weighted by their prevalence in the training data.
14101410
This could be achieved explicitly with ``class_weight={label1: 1,
14111411
label2: 1, ...}`` for all class labels.
14121412

@@ -1581,10 +1581,11 @@ functions or non-estimator constructors.
15811581
in the User Guide.
15821582

15831583
Where multiple metrics can be evaluated, ``scoring`` may be given
1584-
either as a list of unique strings or a dict with names as keys and
1585-
callables as values. Note that this does *not* specify which score
1586-
function is to be maximised, and another parameter such as ``refit``
1587-
may be used for this purpose.
1584+
either as a list of unique strings or a dictionary with names as keys
1585+
and callables as values. Note that this does *not* specify which score
1586+
function is to be maximized, and another parameter such as ``refit``
1587+
maybe used for this purpose.
1588+
15881589

15891590
The ``scoring`` parameter is validated and interpreted using
15901591
:func:`metrics.check_scoring`.
@@ -1604,9 +1605,9 @@ functions or non-estimator constructors.
16041605
When fitting an estimator repeatedly on the same dataset, but for
16051606
multiple parameter values (such as to find the value maximizing
16061607
performance as in :ref:`grid search <grid_search>`), it may be possible
1607-
to reuse aspects of the model learnt from the previous parameter value,
1608+
to reuse aspects of the model learned from the previous parameter value,
16081609
saving time. When ``warm_start`` is true, the existing :term:`fitted`
1609-
model :term:`attributes` are used to initialise the new model
1610+
model :term:`attributes` are used to initialize the new model
16101611
in a subsequent call to :term:`fit`.
16111612

16121613
Note that this is only applicable for some models and some
@@ -1701,8 +1702,8 @@ See concept :term:`sample property`.
17011702
.. glossary::
17021703

17031704
``groups``
1704-
Used in cross validation routines to identify samples which are
1705-
correlated. Each value is an identifier such that, in a supporting
1705+
Used in cross-validation routines to identify samples that are correlated.
1706+
Each value is an identifier such that, in a supporting
17061707
:term:`CV splitter`, samples from some ``groups`` value may not
17071708
appear in both a training set and its corresponding test set.
17081709
See :ref:`group_cv`.

0 commit comments

Comments
 (0)