Skip to content

Commit 07b4253

Browse files
authored
FIX: get the right index when tie breaking in SMOTE NC (#497)
closes #494
1 parent f6db8b3 commit 07b4253

File tree

3 files changed

+13
-7
lines changed

3 files changed

+13
-7
lines changed

doc/over_sampling.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -188,11 +188,11 @@ features or a boolean mask marking these features::
188188
>>> print(sorted(Counter(y_resampled).items()))
189189
[(0, 30), (1, 30)]
190190
>>> print(X_resampled[-5:])
191-
[['B' 0.5246469549655818 0]
192-
['A' -0.3657680728116921 0]
193-
['B' 0.9344237230779993 0]
194-
['A' 0.3710891618824609 0]
195-
['A' 0.3327240726719727 0]]
191+
[['A' 0.5246469549655818 2]
192+
['B' -0.3657680728116921 2]
193+
['A' 0.9344237230779993 2]
194+
['B' 0.3710891618824609 2]
195+
['B' 0.3327240726719727 2]]
196196

197197
Therefore, it can be seen that the samples generated in the first and last
198198
columns are belonging to the same categories originally presented without any

doc/whats_new/v0.4.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,14 @@ Bug fixes
1818
target or multilabel targets. Imbalanced-learn does not support this case.
1919
By :user:`Guillaume Lemaitre <glemaitre>` in :issue:`490`.
2020

21-
- Fix a bug in :class:`imblearn.over_sampling.SMOTENC` in which an sparse
21+
- Fix a bug in :class:`imblearn.over_sampling.SMOTENC` in which a sparse
2222
matrices were densify during ``inverse_transform``.
2323
By :user:`Guillaume Lemaitre <glemaitre>` in :issue:`495`.
2424

25+
- Fix a bug in :class:`imblearn.over_sampling.SMOTE_NC` in which a the tie
26+
breaking was wrongly sampling.
27+
By :user:`Guillaume Lemaitre <glemaitre>` in :issue:`497`.
28+
2529
Version 0.4
2630
===========
2731

imblearn/over_sampling/_smote.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1032,11 +1032,13 @@ def _generate_sample(self, X, nn_data, nn_num, row, col, step):
10321032

10331033
categories_size = ([self.continuous_features_.size] +
10341034
[cat.size for cat in self.ohe_.categories_])
1035+
10351036
for start_idx, end_idx in zip(np.cumsum(categories_size)[:-1],
10361037
np.cumsum(categories_size)[1:]):
10371038
col_max = all_neighbors[:, start_idx:end_idx].sum(axis=0)
10381039
# tie breaking argmax
1039-
col_sel = rng.choice(col_max == col_max.max())
1040+
col_sel = rng.choice(np.flatnonzero(
1041+
np.isclose(col_max, col_max.max())))
10401042
sample[start_idx:end_idx] = 0
10411043
sample[start_idx + col_sel] = 1
10421044

0 commit comments

Comments
 (0)