-
-
Notifications
You must be signed in to change notification settings - Fork 562
Description
Describe the solution you'd like
Our model visualizers expect to wrap classifiers, regressors, or clusters in order to visualize the model under the hood; they even do checks to ensure the right estimator is passed in. Unfortunately in many cases, passing a pipeline object as the model in question does not allow the visualizer to work, even though the model is acceptable as a pipeline, e.g. it is a classifier for classification score visualizers (more on this below). This is primarily because the Pipeline wrapper masks the attributes needed by the visualizer.
I propose that we modify the ModelVisualizer
to change the ModelVisualizer.estimator
attribute to a @property
- when setting the estimator property, we can perform a check to ensure that the Pipeline has a final_estimator
attribute (e.g. that it is not a transformer pipeline). When getting the estimator property, we can return the final estimator instead of the entire Pipeline. This should ensure that we can use pipelines in our model visualizers.
NOTE however that we will still have to fit()
, predict()
, and score()
on the entire pipeline, so this is a bit more nuanced than it seems on first glance. There will probably have to be is_pipeline()
checking and other estimator access utilities.
Is your feature request related to a problem? Please describe.
Consider the following, fairly common code:
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from yellowbrick.classifier import ClassificationReport
model = Pipeline([
('tfidf', TfidfVectorizer()),
('mlp', MLPClassifier()),
])
oz = ClassificationReport(model)
oz.fit(X_train, y_train)
oz.score(X_test, y_test)
oz.poof()
This seems to be a valid model for a classification report, unfortunately the classification report is not able to access the MLPClassiifer's classes_
attribute since the Pipeline doesn't know how to pass that on to the final estimator.
I think the original idea for the ScoreVisualizers
was that they would be inside of Pipelines, e.g.
model = Pipeline([
('tfidf', TfidfVectorizer()),
('clf', ClassificationReport(MLPClassifier())),
])
model.fit(X, y)
model.score(X_test, y_test)
model.named_steps['clf'].poof()
But this makes it difficult to use more than one visualizer; e.g. ROCAUC visualizer and CR visualizer.
Definition of Done
- Update
ModelVisualizer
class with pipeline helpers - Ensure current tests pass
- Add test to all model visualizer subclasses to pass in a pipeline as the estimator
- Add documentation about using visualizers with pipelines