Skip to content

Commit 19b1256

Browse files
committed
some practical hints
1 parent 499a910 commit 19b1256

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

tutorial/which_algorithm.rst

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
How to select the right algorithm for the task
22
==============================================
33

4+
To conclude this session here are some practical hints for selecting
5+
the right algorithm when facing a practical problem.
46

5-
Some practical hints for selecting the right algorithm when facing
6-
a practical problem.
7+
- If the data is high dimensional and sparse (text data), most of the time
8+
linear classifiers with a bit of regularization will work well.
79

10+
- If the data is dense, low to medium dimensional: try to further reduce the
11+
dimensionality with PCA for instance and try both linear and non linear
12+
models (e.g. SVC with RBF kernel).
13+
14+
- ``SVC`` with gaussian RBF kernel and ``KMeans`` clustering can
15+
benefit a lot from data normalization with (``PCA`` or ``RandomizedPCA``
16+
with ``whiten=True``). Try various values for ``n_components`` with grid
17+
search to be sure no to truncate the data too hard.
18+
19+
- There is no free lunch: the best algorithm is data dependant. If
20+
you try many different models, reserve a held out evaluation set
21+
that is not used during the model selection process.
822

0 commit comments

Comments
 (0)