Skip to content

Allow ModelVisualizers to wrap Pipeline objects #498

@bbengfort

Description

@bbengfort

Describe the solution you'd like

Our model visualizers expect to wrap classifiers, regressors, or clusters in order to visualize the model under the hood; they even do checks to ensure the right estimator is passed in. Unfortunately in many cases, passing a pipeline object as the model in question does not allow the visualizer to work, even though the model is acceptable as a pipeline, e.g. it is a classifier for classification score visualizers (more on this below). This is primarily because the Pipeline wrapper masks the attributes needed by the visualizer.

I propose that we modify the ModelVisualizer to change the ModelVisualizer.estimator attribute to a @property - when setting the estimator property, we can perform a check to ensure that the Pipeline has a final_estimator attribute (e.g. that it is not a transformer pipeline). When getting the estimator property, we can return the final estimator instead of the entire Pipeline. This should ensure that we can use pipelines in our model visualizers.

NOTE however that we will still have to fit(), predict(), and score() on the entire pipeline, so this is a bit more nuanced than it seems on first glance. There will probably have to be is_pipeline() checking and other estimator access utilities.

Is your feature request related to a problem? Please describe.

Consider the following, fairly common code:

from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier 
from sklearn.feature_extraction.text import TfidfVectorizer 

from yellowbrick.classifier import ClassificationReport 

model = Pipeline([
    ('tfidf', TfidfVectorizer()), 
    ('mlp', MLPClassifier()), 
]) 

oz = ClassificationReport(model)
oz.fit(X_train, y_train)
oz.score(X_test, y_test)
oz.poof() 

This seems to be a valid model for a classification report, unfortunately the classification report is not able to access the MLPClassiifer's classes_ attribute since the Pipeline doesn't know how to pass that on to the final estimator.

I think the original idea for the ScoreVisualizers was that they would be inside of Pipelines, e.g.

model = Pipeline([
    ('tfidf', TfidfVectorizer()), 
    ('clf', ClassificationReport(MLPClassifier())), 
]) 

model.fit(X, y)
model.score(X_test, y_test)
model.named_steps['clf'].poof() 

But this makes it difficult to use more than one visualizer; e.g. ROCAUC visualizer and CR visualizer.

Definition of Done

  • Update ModelVisualizer class with pipeline helpers
  • Ensure current tests pass
  • Add test to all model visualizer subclasses to pass in a pipeline as the estimator
  • Add documentation about using visualizers with pipelines

Metadata

Metadata

Assignees

Labels

level: intermediatepython coding expertise requiredpriority: mediumcan wait until after next releasetype: featurea new visualizer or utility for yb

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions