Skip to content

EasyEnsembleGeneralization #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

chkoar
Copy link
Owner

@chkoar chkoar commented Jul 20, 2017

@glemaitre I was thinking that I could proceed to the experiments using this implementation. Any suggestions are welcome.

@chkoar chkoar force-pushed the easy_ensemble_generalization branch 2 times, most recently from efec02e to b216fe7 Compare July 20, 2017 02:53
Copy link
Collaborator

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of thoughts. It is a quick review I would need more time to ensure that everything that I said is programmable.


random_state = check_random_state(self.random_state)
estimator_seeds = random_state.randint(MAX_INT, size=self.n_estimators)
sampler_seeds = random_state.randint(MAX_INT, size=self.n_estimators)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should use the _set_random_states from the ensemble.base.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/base.py

pipelines = []
seeds = zip(estimator_seeds, sampler_seeds)

for i, (estimator_seed, sampler_seed) in enumerate(seeds):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the random state is properly done before, we could do that in parallel with joblib

sampler = clone(self.base_sampler_)
sampler.set_params(random_state=sampler_seed)

if hasattr(self.base_estimator_, 'random_state'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i, (estimator_seed, sampler_seed) in enumerate(seeds):

sampler = clone(self.base_sampler_)
sampler.set_params(random_state=sampler_seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a make_sampler similarly to make_estimator


from ..pipeline import Pipeline
from ..under_sampling import RandomUnderSampler as ROS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the full name since we are using it once :) Might be more intuitive and this is a burden to right it once.

@chkoar
Copy link
Owner Author

chkoar commented Jul 21, 2017

@glemaitre since _set_random_states sets the random_state recursively of all nested objects it helped to remove some lines.

@glemaitre
Copy link
Collaborator

For the remaining error, it could because we creating a meta-estimator which should not go through the same common test than estimator:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/testing.py#L508

@chkoar
Copy link
Owner Author

chkoar commented Jul 21, 2017

@glemaitre I can think two optionn right now:

  1. Don't bother right now
  2. Copy the ensemble code

@glemaitre
Copy link
Collaborator

We should be able to monkey patch as well. But I would go for 1. for the moment

@chkoar
Copy link
Owner Author

chkoar commented Jul 21, 2017

@glemaitre We should be able to monkey patch as well.

I didn't mention that, in purpose, cause I thought that you wont like this approach.

@glemaitre
Copy link
Collaborator

glemaitre commented Jul 21, 2017 via email

@codecov
Copy link

codecov bot commented Jul 21, 2017

Codecov Report

Merging #4 into master will decrease coverage by <.01%.
The diff coverage is 97.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
- Coverage   98.32%   98.31%   -0.01%     
==========================================
  Files          68       70       +2     
  Lines        3879     3975      +96     
==========================================
+ Hits         3814     3908      +94     
- Misses         65       67       +2
Impacted Files Coverage Δ
imblearn/ensemble/__init__.py 100% <100%> (ø) ⬆️
...nsemble/tests/test_easy_ensemble_generalization.py 100% <100%> (ø)
imblearn/ensemble/easy_ensemble_generalization.py 96.29% <96.29%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c0628f...69069fb. Read the comment docs.

@chkoar chkoar force-pushed the easy_ensemble_generalization branch from 409737a to aa1f233 Compare August 7, 2017 07:23
@chkoar chkoar force-pushed the easy_ensemble_generalization branch from b71d8cd to cd972c5 Compare August 7, 2017 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants