Skip to content

Commit eaf0a04

Browse files
smarieSylvain MARIEcmarmo
authored
FIX Libsvm and liblinear rand() fix for convergence on windows targets (and improvement on all targets) (scikit-learn#13511)
* Fixed random number generation on windows. See cjlin1/liblinear#28 * Added correct PR number * Fixed random number generator for libsvm on windows targets * Updated comments * Suppressed C4293 warnings using explicit cast. * Suppressed C4293 warnings using explicit cast. * Following code review: `myrand` is now inline, and all integer size checks for random generation fix are now static. * Dummy comment edit to re-trigger build * Fixed include error * Attempt to provide a better and faster result by using a mersene twister random number generator as suggested by https://codeforces.com/blog/entry/61587 * Attempt to integrate the extra compiler arg requesting for explicit C++11 support * Attempt to integrate the extra compiler arg requesting for explicit C++11 support. liblinear extension. * the `extra_compile_args` for C++11 was not working when used in `config.add_extension` so now pre-creating liblinear library first with the flag. * Fixed liblinear extension build: there was a missing library dependency. * Fixed liblinear library dependency for extension compilation. * Fixed liblinear library compilation: added tron * Now correctly setting the seed in libsvm * Now correctly setting the seed in liblinear. This is done using a new interface function `set_seed`. * Updated what's new accordingly * Increased tolerance when comparing `_average_precision` with `_average_precision_slow`. Indeed `_average_precision` is based on predictions ordering and has errors in case of ties (e.g. several 0.5 prediction values) * Minor improvement of this test method for readability (no change in output) * Minor doc update as per review * dropped commented line * Reverted test readability improvement change * Using double barticks * Clarified all comments * removed useless space * Clarified all comments, and included a `set_seed` method as per code review. * Updated what's new about changed models as suggested by review. * Replaced random number generator: now using a `bounded_rand_int` using tweaked Lemire post-processor from http://www.pcg-random.org/posts/bounded-rands.html) * Updated readme rst and moved it to 0.22 * Whats new moved from 0.222 to 0.23 * Updated docs * Update doc/whats_new/v0.23.rst Co-Authored-By: Chiara Marmo <[email protected]> * Dummy change to re-trigger the CI build * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.22.rst * As per code review: added LogisticRegression in the list of changes, and repeated the list of classes affected in changelog. * New random number generator used in libsvm / liblinear is now in an independent header file `newrand/newrand.h`. * Update sklearn/svm/src/liblinear/linear.cpp * Fixed dates in headers and added a header to newrand.h * Added a sentence in the changelog to explain the impact in user-friendly terms. * Added link to PR in readme file and removed commented lines as per code review Co-authored-by: Sylvain MARIE <[email protected]> Co-authored-by: Chiara Marmo <[email protected]>
1 parent e3e9137 commit eaf0a04

File tree

9 files changed

+188
-48
lines changed

9 files changed

+188
-48
lines changed

doc/whats_new/v0.23.rst

+27-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,12 @@ random sampling procedures.
2525
- :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor`,
2626
and :class:`ensemble.IsolationForest`. |Fix|
2727

28+
- Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
29+
including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
30+
:class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
31+
:class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`.
32+
|Efficiency| |Fix|
33+
2834
Details are listed in the changelog below.
2935

3036
(While we are trying to better inform users by providing this information, we
@@ -296,7 +302,7 @@ Changelog
296302
- |API| Changed the formatting of values in
297303
:meth:`metrics.ConfusionMatrixDisplay.plot` and
298304
:func:`metrics.plot_confusion_matrix` to pick the shorter format (either '2g'
299-
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
305+
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
300306
`Thomas Fan`_.
301307

302308
- |Enhancement| :func:`metrics.pairwise.pairwise_distances_chunked` now allows
@@ -378,6 +384,26 @@ Changelog
378384
:mod:`sklearn.svm`
379385
..................
380386

387+
- |Fix| |Efficiency| Improved ``libsvm`` and ``liblinear`` random number
388+
generators used to randomly select coordinates in the coordinate descent
389+
algorithms. Platform-dependent C ``rand()`` was used, which is only able to
390+
generate numbers up to ``32767`` on windows platform (see this `blog
391+
post <https://codeforces.com/blog/entry/61587>`) and also has poor
392+
randomization power as suggested by `this presentation
393+
<https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful>`.
394+
It was replaced with C++11 ``mt19937``, a Mersenne Twister that correctly
395+
generates 31bits/63bits random numbers on all platforms. In addition, the
396+
crude "modulo" postprocessor used to get a random number in a bounded
397+
interval was replaced by the tweaked Lemire method as suggested by `this blog
398+
post <http://www.pcg-random.org/posts/bounded-rands.html>`.
399+
Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
400+
including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
401+
:class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
402+
:class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`,
403+
is affected. In particular users can expect a better convergence when the
404+
number of samples (LibSVM) or the number of features (LibLinear) is large.
405+
:pr:`13511` by :user:`Sylvain Marié <smarie>`.
406+
381407
- |API| :class:`svm.SVR` and :class:`svm.OneClassSVM` attributes, `probA_` and
382408
`probB_`, are now deprecated as they were not useful. :pr:`15558` by
383409
`Thomas Fan`_.

sklearn/metrics/tests/test_ranking.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -737,8 +737,9 @@ def _test_precision_recall_curve(y_true, probas_pred):
737737
assert_array_almost_equal(precision_recall_auc, 0.859, 3)
738738
assert_array_almost_equal(precision_recall_auc,
739739
average_precision_score(y_true, probas_pred))
740+
# `_average_precision` is not very precise in case of 0.5 ties: be tolerant
740741
assert_almost_equal(_average_precision(y_true, probas_pred),
741-
precision_recall_auc, decimal=3)
742+
precision_recall_auc, decimal=2)
742743
assert p.size == r.size
743744
assert p.size == thresholds.size + 1
744745
# Smoke test in the case of proba having only one value

sklearn/svm/setup.py

+28-7
Original file line numberDiff line numberDiff line change
@@ -16,22 +16,27 @@ def configuration(parent_package='', top_path=None):
1616
config.add_library('libsvm-skl',
1717
sources=[join('src', 'libsvm', 'libsvm_template.cpp')],
1818
depends=[join('src', 'libsvm', 'svm.cpp'),
19-
join('src', 'libsvm', 'svm.h')],
19+
join('src', 'libsvm', 'svm.h'),
20+
join('src', 'newrand', 'newrand.h')],
2021
# Force C++ linking in case gcc is picked up instead
2122
# of g++ under windows with some versions of MinGW
2223
extra_link_args=['-lstdc++'],
24+
# Use C++11 to use the random number generator fix
25+
extra_compiler_args=['-std=c++11'],
2326
)
2427

2528
libsvm_sources = ['_libsvm.pyx']
2629
libsvm_depends = [join('src', 'libsvm', 'libsvm_helper.c'),
2730
join('src', 'libsvm', 'libsvm_template.cpp'),
2831
join('src', 'libsvm', 'svm.cpp'),
29-
join('src', 'libsvm', 'svm.h')]
32+
join('src', 'libsvm', 'svm.h'),
33+
join('src', 'newrand', 'newrand.h')]
3034

3135
config.add_extension('_libsvm',
3236
sources=libsvm_sources,
3337
include_dirs=[numpy.get_include(),
34-
join('src', 'libsvm')],
38+
join('src', 'libsvm'),
39+
join('src', 'newrand')],
3540
libraries=['libsvm-skl'],
3641
depends=libsvm_depends,
3742
)
@@ -41,16 +46,30 @@ def configuration(parent_package='', top_path=None):
4146
if os.name == 'posix':
4247
libraries.append('m')
4348

44-
liblinear_sources = ['_liblinear.pyx',
45-
join('src', 'liblinear', '*.cpp')]
49+
# precompile liblinear to use C++11 flag
50+
config.add_library('liblinear-skl',
51+
sources=[join('src', 'liblinear', 'linear.cpp'),
52+
join('src', 'liblinear', 'tron.cpp')],
53+
depends=[join('src', 'liblinear', 'linear.h'),
54+
join('src', 'liblinear', 'tron.h'),
55+
join('src', 'newrand', 'newrand.h')],
56+
# Force C++ linking in case gcc is picked up instead
57+
# of g++ under windows with some versions of MinGW
58+
extra_link_args=['-lstdc++'],
59+
# Use C++11 to use the random number generator fix
60+
extra_compiler_args=['-std=c++11'],
61+
)
4662

63+
liblinear_sources = ['_liblinear.pyx']
4764
liblinear_depends = [join('src', 'liblinear', '*.h'),
65+
join('src', 'newrand', 'newrand.h'),
4866
join('src', 'liblinear', 'liblinear_helper.c')]
4967

5068
config.add_extension('_liblinear',
5169
sources=liblinear_sources,
52-
libraries=libraries,
70+
libraries=['liblinear-skl'] + libraries,
5371
include_dirs=[join('.', 'src', 'liblinear'),
72+
join('.', 'src', 'newrand'),
5473
join('..', 'utils'),
5574
numpy.get_include()],
5675
depends=liblinear_depends,
@@ -64,8 +83,10 @@ def configuration(parent_package='', top_path=None):
6483
config.add_extension('_libsvm_sparse', libraries=['libsvm-skl'],
6584
sources=libsvm_sparse_sources,
6685
include_dirs=[numpy.get_include(),
67-
join("src", "libsvm")],
86+
join("src", "libsvm"),
87+
join("src", "newrand")],
6888
depends=[join("src", "libsvm", "svm.h"),
89+
join('src', 'newrand', 'newrand.h'),
6990
join("src", "libsvm",
7091
"libsvm_sparse_helper.c")])
7192

sklearn/svm/src/liblinear/liblinear_helper.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ struct parameter *set_parameter(int solver_type, double eps, double C,
182182
if (param == NULL)
183183
return NULL;
184184

185-
srand(seed);
185+
set_seed(seed);
186186
param->solver_type = solver_type;
187187
param->eps = eps;
188188
param->C = C;

sklearn/svm/src/liblinear/linear.cpp

+43-33
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
/*
1+
/*
22
Modified 2011:
33
44
- Make labels sorted in group_classes, Dan Yamins.
5-
5+
66
Modified 2012:
77
88
- Changes roles of +1 and -1 to match scikit API, Andreas Mueller
@@ -22,6 +22,13 @@
2222
Modified 2015:
2323
- Patched liblinear for sample_weights - Manoj Kumar
2424
See https://github.com/scikit-learn/scikit-learn/pull/5274
25+
26+
Modified 2020:
27+
- Improved random number generator by using a mersenne twister + tweaked
28+
lemire postprocessor. This fixed a convergence issue on windows targets.
29+
Sylvain Marie
30+
See <https://github.com/scikit-learn/scikit-learn/pull/13511#issuecomment-481729756>
31+
2532
*/
2633

2734
#include <math.h>
@@ -32,6 +39,10 @@
3239
#include <locale.h>
3340
#include "linear.h"
3441
#include "tron.h"
42+
#include <climits>
43+
#include <random>
44+
#include "../newrand/newrand.h"
45+
3546
typedef signed char schar;
3647
template <class T> static inline void swap(T& x, T& y) { T t=x; x=y; y=t; }
3748
#ifndef min
@@ -456,19 +467,19 @@ void l2r_l2_svr_fun::grad(double *w, double *g)
456467
g[i] = w[i] + 2*g[i];
457468
}
458469

459-
// A coordinate descent algorithm for
470+
// A coordinate descent algorithm for
460471
// multi-class support vector machines by Crammer and Singer
461472
//
462473
// min_{\alpha} 0.5 \sum_m ||w_m(\alpha)||^2 + \sum_i \sum_m e^m_i alpha^m_i
463474
// s.t. \alpha^m_i <= C^m_i \forall m,i , \sum_m \alpha^m_i=0 \forall i
464-
//
475+
//
465476
// where e^m_i = 0 if y_i = m,
466477
// e^m_i = 1 if y_i != m,
467-
// C^m_i = C if m = y_i,
468-
// C^m_i = 0 if m != y_i,
469-
// and w_m(\alpha) = \sum_i \alpha^m_i x_i
478+
// C^m_i = C if m = y_i,
479+
// C^m_i = 0 if m != y_i,
480+
// and w_m(\alpha) = \sum_i \alpha^m_i x_i
470481
//
471-
// Given:
482+
// Given:
472483
// x, y, C
473484
// eps is the stopping tolerance
474485
//
@@ -579,7 +590,7 @@ int Solver_MCSVM_CS::Solve(double *w)
579590
double eps_shrink = max(10.0*eps, 1.0); // stopping tolerance for shrinking
580591
bool start_from_all = true;
581592

582-
// Initial alpha can be set here. Note that
593+
// Initial alpha can be set here. Note that
583594
// sum_m alpha[i*nr_class+m] = 0, for all i=1,...,l-1
584595
// alpha[i*nr_class+m] <= C[GETI(i)] if prob->y[i] == m
585596
// alpha[i*nr_class+m] <= 0 if prob->y[i] != m
@@ -615,7 +626,7 @@ int Solver_MCSVM_CS::Solve(double *w)
615626
double stopping = -INF;
616627
for(i=0;i<active_size;i++)
617628
{
618-
int j = i+rand()%(active_size-i);
629+
int j = i+bounded_rand_int(active_size-i);
619630
swap(index[i], index[j]);
620631
}
621632
for(s=0;s<active_size;s++)
@@ -775,14 +786,14 @@ int Solver_MCSVM_CS::Solve(double *w)
775786
return iter;
776787
}
777788

778-
// A coordinate descent algorithm for
789+
// A coordinate descent algorithm for
779790
// L1-loss and L2-loss SVM dual problems
780791
//
781792
// min_\alpha 0.5(\alpha^T (Q + D)\alpha) - e^T \alpha,
782793
// s.t. 0 <= \alpha_i <= upper_bound_i,
783-
//
794+
//
784795
// where Qij = yi yj xi^T xj and
785-
// D is a diagonal matrix
796+
// D is a diagonal matrix
786797
//
787798
// In L1-SVM case:
788799
// upper_bound_i = Cp if y_i = 1
@@ -793,12 +804,12 @@ int Solver_MCSVM_CS::Solve(double *w)
793804
// D_ii = 1/(2*Cp) if y_i = 1
794805
// D_ii = 1/(2*Cn) if y_i = -1
795806
//
796-
// Given:
807+
// Given:
797808
// x, y, Cp, Cn
798809
// eps is the stopping tolerance
799810
//
800811
// solution will be put in w
801-
//
812+
//
802813
// See Algorithm 3 of Hsieh et al., ICML 2008
803814

804815
#undef GETI
@@ -888,7 +899,7 @@ static int solve_l2r_l1l2_svc(
888899

889900
for (i=0; i<active_size; i++)
890901
{
891-
int j = i+rand()%(active_size-i);
902+
int j = i+bounded_rand_int(active_size-i);
892903
swap(index[i], index[j]);
893904
}
894905

@@ -1009,14 +1020,14 @@ static int solve_l2r_l1l2_svc(
10091020
}
10101021

10111022

1012-
// A coordinate descent algorithm for
1023+
// A coordinate descent algorithm for
10131024
// L1-loss and L2-loss epsilon-SVR dual problem
10141025
//
10151026
// min_\beta 0.5\beta^T (Q + diag(lambda)) \beta - p \sum_{i=1}^l|\beta_i| + \sum_{i=1}^l yi\beta_i,
10161027
// s.t. -upper_bound_i <= \beta_i <= upper_bound_i,
1017-
//
1028+
//
10181029
// where Qij = xi^T xj and
1019-
// D is a diagonal matrix
1030+
// D is a diagonal matrix
10201031
//
10211032
// In L1-SVM case:
10221033
// upper_bound_i = C
@@ -1025,13 +1036,13 @@ static int solve_l2r_l1l2_svc(
10251036
// upper_bound_i = INF
10261037
// lambda_i = 1/(2*C)
10271038
//
1028-
// Given:
1039+
// Given:
10291040
// x, y, p, C
10301041
// eps is the stopping tolerance
10311042
//
10321043
// solution will be put in w
10331044
//
1034-
// See Algorithm 4 of Ho and Lin, 2012
1045+
// See Algorithm 4 of Ho and Lin, 2012
10351046

10361047
#undef GETI
10371048
#define GETI(i) (i)
@@ -1107,7 +1118,7 @@ static int solve_l2r_l1l2_svr(
11071118

11081119
for(i=0; i<active_size; i++)
11091120
{
1110-
int j = i+rand()%(active_size-i);
1121+
int j = i+bounded_rand_int(active_size-i);
11111122
swap(index[i], index[j]);
11121123
}
11131124

@@ -1253,17 +1264,17 @@ static int solve_l2r_l1l2_svr(
12531264
}
12541265

12551266

1256-
// A coordinate descent algorithm for
1267+
// A coordinate descent algorithm for
12571268
// the dual of L2-regularized logistic regression problems
12581269
//
12591270
// min_\alpha 0.5(\alpha^T Q \alpha) + \sum \alpha_i log (\alpha_i) + (upper_bound_i - \alpha_i) log (upper_bound_i - \alpha_i),
12601271
// s.t. 0 <= \alpha_i <= upper_bound_i,
1261-
//
1262-
// where Qij = yi yj xi^T xj and
1272+
//
1273+
// where Qij = yi yj xi^T xj and
12631274
// upper_bound_i = Cp if y_i = 1
12641275
// upper_bound_i = Cn if y_i = -1
12651276
//
1266-
// Given:
1277+
// Given:
12671278
// x, y, Cp, Cn
12681279
// eps is the stopping tolerance
12691280
//
@@ -1333,7 +1344,7 @@ int solve_l2r_lr_dual(const problem *prob, double *w, double eps, double Cp, dou
13331344
{
13341345
for (i=0; i<l; i++)
13351346
{
1336-
int j = i+rand()%(l-i);
1347+
int j = i+bounded_rand_int(l-i);
13371348
swap(index[i], index[j]);
13381349
}
13391350
int newton_iter = 0;
@@ -1521,7 +1532,7 @@ static int solve_l1r_l2_svc(
15211532

15221533
for(j=0; j<active_size; j++)
15231534
{
1524-
int i = j+rand()%(active_size-j);
1535+
int i = j+bounded_rand_int(active_size-j);
15251536
swap(index[i], index[j]);
15261537
}
15271538

@@ -1903,7 +1914,7 @@ static int solve_l1r_lr(
19031914

19041915
for(j=0; j<QP_active_size; j++)
19051916
{
1906-
int i = j+rand()%(QP_active_size-j);
1917+
int i = j+bounded_rand_int(QP_active_size-j);
19071918
swap(index[i], index[j]);
19081919
}
19091920

@@ -2234,14 +2245,14 @@ static void group_classes(const problem *prob, int *nr_class_ret, int **label_re
22342245
label[i+1] = this_label;
22352246
count[i+1] = this_count;
22362247
}
2237-
2248+
22382249
for (i=0; i <l; i++)
22392250
{
22402251
j = 0;
22412252
int this_label = (int)prob->y[i];
22422253
while(this_label != label[j])
22432254
{
2244-
j++;
2255+
j++;
22452256
}
22462257
data_label[i] = j;
22472258

@@ -2594,7 +2605,7 @@ void cross_validation(const problem *prob, const parameter *param, int nr_fold,
25942605
for(i=0;i<l;i++) perm[i]=i;
25952606
for(i=0;i<l;i++)
25962607
{
2597-
int j = i+rand()%(l-i);
2608+
int j = i+bounded_rand_int(l-i);
25982609
swap(perm[i],perm[j]);
25992610
}
26002611
for(i=0;i<=nr_fold;i++)
@@ -3057,4 +3068,3 @@ void set_print_string_function(void (*print_func)(const char*))
30573068
else
30583069
liblinear_print_string = print_func;
30593070
}
3060-

sklearn/svm/src/liblinear/linear.h

+2
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ struct model
4949
int *n_iter; /* no. of iterations of each class */
5050
};
5151

52+
void set_seed(unsigned seed);
53+
5254
struct model* train(const struct problem *prob, const struct parameter *param, BlasFunctions *blas_functions);
5355
void cross_validation(const struct problem *prob, const struct parameter *param, int nr_fold, double *target);
5456

0 commit comments

Comments
 (0)