Drift detectors #1474

gabrieljaguiar · 2023-12-12T19:13:25Z

In recent surveys new drift detectors have been used, while in River we do not have acess to those implementations. FHDDM and FHDDMS have been cited more than 250 times and are present in recent literature. Therefore, the presence of those methods are needed in a package like river.

MaxHalford · 2023-12-13T10:58:39Z

I'll let @smastelini handle this one 🙏

My only input is that I don't like the idea of adding more classes. Especially if these are just variants of a HDDM. I would rather have a single HDDM class with parameters to choose between variants. But I don't enough about drift detection to tell if this is possible or not.

smastelini · 2023-12-13T22:35:58Z

Hey @gabrieljaguiar, you told me these methods are not directly related to HDDM. Care to expand on that?

gabrieljaguiar · 2023-12-15T20:57:28Z

Hi @MaxHalford and @smastelini .

So, the only common thing between FHDDM and HDDM is the Hoeffing Inequality. While HDDM maintain mean standard deviation of the distribution, FHDDM maintain 1 (or 2) window (s) of predictions and use the inequality to determine whether a drift occured or not. Actually, FHDDM would be more similar to ADWIN than HDDM.

smastelini

Hi @gabrieljaguiar, I finished a first pass in the code and left some comments to be addressed.

The main points:

avoiding redundant code
using collections.deque: it is faster and Pythonic

On that note, I echo Max's comment about having multiple classes. You mentioned these new methods are not similar to the classic HDDM. While this might be true, FHDDM and FHDDMS share many similarities. It would be nice to unify them and control via the init parameter which behavior to follow.

Lastly, we need to pay attention to what will be exposed to the user or not. For instance, the epsilon values and other properties. I think some of them could become private or turned into properties if necessary.

river/drift/binary/fhddm.py

river/drift/binary/fhddm_s.py

smastelini · 2024-01-05T18:37:42Z

river/drift/binary/fhddm.py

+        self._sliding_window.append(x)
+
+        if len(self._sliding_window) == self.sliding_window_size:
+            n_one = self._sliding_window.count(1)


For performance reasons, keeping the sum of inputs (which are either 0 or 1) is a better idea. Otherwise, each insertion will have an O(w), where w is the length of the window.

I do not know if I understood correctly, but if I keep the sum I would never know which one to remove from the sliding window when it is full.

You will remove the value of self._sliding_window.popleft() :D

smastelini

Hi @gabrieljaguiar, you streamlined the code a lot! I left one additional request regarding functionality.

Apart from that, would be nice to document how to use both versions of the detector. For example, you can expand the example section. Besides that, please add an entry on docs/unreleased.md with the new additions.

gabrieljaguiar added 4 commits December 11, 2023 15:20

fhddm

a6585d8

fhddm

530ad13

fhddm_s

8841772

annotations

23d56c6

gabrieljaguiar requested review from MaxHalford and smastelini as code owners December 12, 2023 19:13

gabrieljaguiar added 4 commits December 12, 2023 14:17

code quality fixes

9f0fa37

code quality fix

f5f985a

code quality

a7e4154

codequality

5b0fe2f

smastelini self-assigned this Dec 13, 2023

smastelini added the New feature label Dec 13, 2023

smastelini requested changes Dec 18, 2023

View reviewed changes

gabrieljaguiar added 6 commits December 18, 2023 15:31

doc fixes and private variables

3e972d5

doc fixes

d3c38a9

using collections.deque

aee8dc7

merge detectors

6a0877e

docstring and merge detectors

accb440

fixes and merge

03ee4c1

gabrieljaguiar requested a review from smastelini January 4, 2024 16:57

smastelini reviewed Jan 5, 2024

View reviewed changes

smastelini requested changes Jan 5, 2024

View reviewed changes

gabrieljaguiar added 5 commits January 9, 2024 10:43

fhddms added to example

3210079

test fixes

84a51ef

fixes

8de7e50

fixes

649645a

fixing example

795a573

gabrieljaguiar added 4 commits January 9, 2024 11:24

example fix

6a131a7

unreleased file

5880af7

remove count

cd4c1ac

remove count

ce59424

gabrieljaguiar requested a review from smastelini January 29, 2024 22:57

smastelini approved these changes Feb 1, 2024

View reviewed changes

smastelini merged commit 1c98e9e into online-ml:main Feb 1, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drift detectors #1474

Drift detectors #1474

gabrieljaguiar commented Dec 12, 2023 •

edited

Loading

MaxHalford commented Dec 13, 2023

smastelini commented Dec 13, 2023

gabrieljaguiar commented Dec 15, 2023

smastelini left a comment

smastelini Jan 5, 2024

gabrieljaguiar Jan 9, 2024

smastelini Jan 15, 2024

smastelini left a comment

Drift detectors #1474

Drift detectors #1474

Conversation

gabrieljaguiar commented Dec 12, 2023 • edited Loading

MaxHalford commented Dec 13, 2023

smastelini commented Dec 13, 2023

gabrieljaguiar commented Dec 15, 2023

smastelini left a comment

Choose a reason for hiding this comment

smastelini Jan 5, 2024

Choose a reason for hiding this comment

gabrieljaguiar Jan 9, 2024

Choose a reason for hiding this comment

smastelini Jan 15, 2024

Choose a reason for hiding this comment

smastelini left a comment

Choose a reason for hiding this comment

gabrieljaguiar commented Dec 12, 2023 •

edited

Loading