Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drift detectors #1474

Merged
merged 23 commits into from
Feb 1, 2024
Merged

Drift detectors #1474

merged 23 commits into from
Feb 1, 2024

Conversation

gabrieljaguiar
Copy link
Contributor

@gabrieljaguiar gabrieljaguiar commented Dec 12, 2023

In recent surveys new drift detectors have been used, while in River we do not have acess to those implementations. FHDDM and FHDDMS have been cited more than 250 times and are present in recent literature. Therefore, the presence of those methods are needed in a package like river.

@smastelini smastelini self-assigned this Dec 13, 2023
@MaxHalford
Copy link
Member

I'll let @smastelini handle this one 🙏

My only input is that I don't like the idea of adding more classes. Especially if these are just variants of a HDDM. I would rather have a single HDDM class with parameters to choose between variants. But I don't enough about drift detection to tell if this is possible or not.

@smastelini
Copy link
Member

Hey @gabrieljaguiar, you told me these methods are not directly related to HDDM. Care to expand on that?

@gabrieljaguiar
Copy link
Contributor Author

Hi @MaxHalford and @smastelini .

So, the only common thing between FHDDM and HDDM is the Hoeffing Inequality. While HDDM maintain mean standard deviation of the distribution, FHDDM maintain 1 (or 2) window (s) of predictions and use the inequality to determine whether a drift occured or not. Actually, FHDDM would be more similar to ADWIN than HDDM.

Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gabrieljaguiar, I finished a first pass in the code and left some comments to be addressed.

The main points:

  • avoiding redundant code
  • using collections.deque: it is faster and Pythonic

On that note, I echo Max's comment about having multiple classes. You mentioned these new methods are not similar to the classic HDDM. While this might be true, FHDDM and FHDDMS share many similarities. It would be nice to unify them and control via the init parameter which behavior to follow.

Lastly, we need to pay attention to what will be exposed to the user or not. For instance, the epsilon values and other properties. I think some of them could become private or turned into properties if necessary.

self._sliding_window.append(x)

if len(self._sliding_window) == self.sliding_window_size:
n_one = self._sliding_window.count(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, keeping the sum of inputs (which are either 0 or 1) is a better idea. Otherwise, each insertion will have an O(w), where w is the length of the window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know if I understood correctly, but if I keep the sum I would never know which one to remove from the sliding window when it is full.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will remove the value of self._sliding_window.popleft() :D

Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gabrieljaguiar, you streamlined the code a lot! I left one additional request regarding functionality.

Apart from that, would be nice to document how to use both versions of the detector. For example, you can expand the example section. Besides that, please add an entry on docs/unreleased.md with the new additions.

@smastelini smastelini merged commit 1c98e9e into online-ml:main Feb 1, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants