-
-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drift detectors #1474
Drift detectors #1474
Conversation
I'll let @smastelini handle this one 🙏 My only input is that I don't like the idea of adding more classes. Especially if these are just variants of a HDDM. I would rather have a single HDDM class with parameters to choose between variants. But I don't enough about drift detection to tell if this is possible or not. |
Hey @gabrieljaguiar, you told me these methods are not directly related to HDDM. Care to expand on that? |
Hi @MaxHalford and @smastelini . So, the only common thing between FHDDM and HDDM is the Hoeffing Inequality. While HDDM maintain mean standard deviation of the distribution, FHDDM maintain 1 (or 2) window (s) of predictions and use the inequality to determine whether a drift occured or not. Actually, FHDDM would be more similar to ADWIN than HDDM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gabrieljaguiar, I finished a first pass in the code and left some comments to be addressed.
The main points:
- avoiding redundant code
- using
collections.deque
: it is faster and Pythonic
On that note, I echo Max's comment about having multiple classes. You mentioned these new methods are not similar to the classic HDDM. While this might be true, FHDDM and FHDDMS share many similarities. It would be nice to unify them and control via the init parameter which behavior to follow.
Lastly, we need to pay attention to what will be exposed to the user or not. For instance, the epsilon values and other properties. I think some of them could become private or turned into properties if necessary.
river/drift/binary/fhddm.py
Outdated
self._sliding_window.append(x) | ||
|
||
if len(self._sliding_window) == self.sliding_window_size: | ||
n_one = self._sliding_window.count(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons, keeping the sum of inputs (which are either 0 or 1) is a better idea. Otherwise, each insertion will have an O(w), where w is the length of the window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know if I understood correctly, but if I keep the sum I would never know which one to remove from the sliding window when it is full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will remove the value of self._sliding_window.popleft()
:D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gabrieljaguiar, you streamlined the code a lot! I left one additional request regarding functionality.
Apart from that, would be nice to document how to use both versions of the detector. For example, you can expand the example section. Besides that, please add an entry on docs/unreleased.md
with the new additions.
In recent surveys new drift detectors have been used, while in River we do not have acess to those implementations. FHDDM and FHDDMS have been cited more than 250 times and are present in recent literature. Therefore, the presence of those methods are needed in a package like river.