-
Notifications
You must be signed in to change notification settings - Fork 0
EasyEnsembleGeneralization #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
efec02e
to
b216fe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of thoughts. It is a quick review I would need more time to ensure that everything that I said is programmable.
|
||
random_state = check_random_state(self.random_state) | ||
estimator_seeds = random_state.randint(MAX_INT, size=self.n_estimators) | ||
sampler_seeds = random_state.randint(MAX_INT, size=self.n_estimators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should use the _set_random_states
from the ensemble.base.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/base.py
pipelines = [] | ||
seeds = zip(estimator_seeds, sampler_seeds) | ||
|
||
for i, (estimator_seed, sampler_seed) in enumerate(seeds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the random state is properly done before, we could do that in parallel with joblib
sampler = clone(self.base_sampler_) | ||
sampler.set_params(random_state=sampler_seed) | ||
|
||
if hasattr(self.base_estimator_, 'random_state'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reuse make_estimator
?
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/base.py#L119
for i, (estimator_seed, sampler_seed) in enumerate(seeds): | ||
|
||
sampler = clone(self.base_sampler_) | ||
sampler.set_params(random_state=sampler_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should create a make_sampler
similarly to make_estimator
|
||
from ..pipeline import Pipeline | ||
from ..under_sampling import RandomUnderSampler as ROS | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the full name since we are using it once :) Might be more intuitive and this is a burden to right it once.
@glemaitre since |
For the remaining error, it could because we creating a meta-estimator which should not go through the same common test than estimator: |
@glemaitre I can think two optionn right now:
|
We should be able to monkey patch as well. But I would go for 1. for the moment |
I didn't mention that, in purpose, cause I thought that you wont like this approach. |
If this is very small patch, I am for it ;)
|
Codecov Report
@@ Coverage Diff @@
## master #4 +/- ##
==========================================
- Coverage 98.32% 98.31% -0.01%
==========================================
Files 68 70 +2
Lines 3879 3975 +96
==========================================
+ Hits 3814 3908 +94
- Misses 65 67 +2
Continue to review full report at Codecov.
|
409737a
to
aa1f233
Compare
b71d8cd
to
cd972c5
Compare
@glemaitre I was thinking that I could proceed to the experiments using this implementation. Any suggestions are welcome.