Skip to content

SLEP021: Unified API for computing feature importance #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
slep012/proposal
slep017/proposal
slep019/proposal
slep021/proposal

.. toctree::
:maxdepth: 1
Expand Down
192 changes: 192 additions & 0 deletions slep021/proposal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
.. _slep_021:

==================================================
SLEP021: Unified API to compute feature importance
==================================================

:Author: Thomas J Fan, Guillaume Lemaitre
:Status: Draft
:Type: Standards Track
:Created: 2023-03-09

Abstract
--------

This SLEP proposes a common API for computing feature importance.

Detailed description
--------------------

Motivation
~~~~~~~~~~

Data scientists rely on feature importance when inspecting a trained model.
Feature importance is a measure of how much a feature contributes to the
prediction and thus gives insights on the model the predictions provided by
the model.

However, there is currently not a single method to compute feature importance.
All available methods are designed upon axioms or hypotheses that are not
necessarly respected in practice.

Some work in scikit-learn has been done to provide documentation to highlight
the limitations of some implemented methods. However, there is currently not
a common way to expose feature importance in scikit-learn. In addition, for
some historical reasons, some estimators (e.g. decision tree) provide a single
feature importance that could be used as the "method-to-use" to analyse the
model. It is problematic since there is not defacto standard to analyse the
feature importance of a model.

Therefore, this SLEP proposes an API for providing feature importance allowing
to be flexible to switch between methods and extensible to add new methods. It
is a follow-up of initial discussions from :issue:`20059`.

Current state
~~~~~~~~~~~~~

Available methods
^^^^^^^^^^^^^^^^^

The following methods are available in scikit-learn to provide some feature
importance:

- The function :func:`sklearn.inspection.permutation_importance`. It requests
a fitted estimator and a dataset. Addtional parameters can be provided. The
method returns a `Bunch` containing 3 attributes: all decrease in score for
all repeatitions, the mean, and the standard deviation across the repeats.
This method is therefore estimator agnostic.
- The linear estimators have a `coef_` attributes once fitted.
- The decision tree-based estimators have a `feature_importances_` attribute
once fitted.

Use cases
^^^^^^^^^

The first usage of feature importance is to inspect a fitted model. Usually,
the feature importance will be plotted to visualize the importance of the
features::

>>> tree = DecisionTreeClassifier().fit(X_train, y_train)
>>> plt.barh(X_train.columns, tree.feature_importances_)

The analysis can be done further by checking the variance of the feature
importance. :func:`sklearn.inspection.permutation_importance` already provides
a way to do that since it repeats the computation several time. For the model
specific feature importance, the user can use cross-validation to get an idea
of the dispersion::

>>> cv_results = cross_validate(tree, X_train, y_train, return_estimator=True)
>>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]]
>>> plt.boxplot(feature_importances, labels=X_train.columns)

The second usage is about the model selection. Meta-estimator such as
:class:`sklearn.model_selection.SelectFromModel` internally use an array of
length `(n_features,)` to select feature and retrain a model on this subset of
feature.

By default :class:`sklearn.model_selection.SelectFromModel` relies on the
estimator to expose `coef_` or `feature_importances_`::

>>> SelectFromModel(tree).fit(X_train, y_train) # `tree` exposes `feature_importances_`

For more flexibility, a string can be provided::

>>> linear_model = make_pipeline(StandardScaler(), LogisticRegression())
>>> SelectFromModel(
... linear_model, importance_getter="named_steps.logisticregression.coef_"
... ).fit(X_train, y_train)

:class:`sklearn.model_selection.SelectFromModel` rely by default on
the estimator to expose a `coef_` or `feature_importances_` attribute. It is
also possible to provide a string corresponding the attribute name returning
the feature importance. It allows to deal with estimator embedded inside a
pipeline, for instance. Finally, a callable taking an estimator and returning
a NumPy array can also be provided.

Current pitfalls
~~~~~~~~~~~~~~~~

On a methodological perspective, scikit-learn does not encourage for good
practice. Indeed, since it provides a defacto `feature_importances_` attribute
for the decision tree, it is tempting for user to believe that this method is
the best one.

In the same spirit, the :class:`sklearn.model_selection.SelectFromModel`
meta-estimator uses defacto the `feature_importances_` or `coef_` for selecting
features.

In both cases, it should be better to request the user to be more explicit and
request to choose a specific method to compute the feature importance for
inspection or feature selection.

Additionally, `feature_importances_` and `coef_` are statistics derived from
the training set. We already documented that the reported
`feature_importances_` will potentially show biases for features used by the
model to overfit. Thus, it will potentially negatively impact the feature
Comment on lines +119 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a link to the docs we're mentioning would be nice

selection once used in the :class:`sklearn.model_selection.SelectFromModel`
meta-estimator.

On an API perspective, the current functionality for feature importance are
available via functions or attributes, with no common API.

Solution
~~~~~~~~

A common API
^^^^^^^^^^^^

**Proposal 1**: Expose a parameter in `__init__` to select the method to use
to compute the feature importance. The computation will be done using a method,
e.g. `get_feature_importance` that could take additional parameters requested
by the feature importance method. This method could therefore be used
internally by :class:`sklearn.model_selection.SelectFromModel`.
Comment on lines +135 to +139

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like ClassifierMixin and RegressorMixin and specific methods within each class when applicable. In this sense, we would have the "main" feature importance method selected during init (defining the behavior of get_feature_importance). Still, one could always use the others because the estimator would have get_permutation_importance, get_mdi_importance, get_abs_coef_importance etc.


**Proposal 2**: Create a meta-estimator that takes a model and a method in
`__init__`. Then, a method `fit` could compute the feature importance given
some data. Then, the feature importance could be available through a fitted
attribute `feature_importances_` (or a method?). We could reuse such
meta-estimator in the :class:`sklearn.model_selection.SelectFromModel`.

Then, we should rely on a common API for the methods computing the feature
importance. It seems that they should all at least accept a fitted estimator,
some dataset, and potentially some extra parameters.

Plotting
^^^^^^^^

Add a new :class:`sklearn.inspection.FeatureImportanceDisplay` class to
:mod:`sklearn.inspection`. Two methods could be useful for this display: (i)
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_estimator` to plot a
a single estimate of feature importance and (ii)
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_cv_results` to plot a
an estimate of the feature importance together with the variance.

The display should therefore be aware how to retrieve the feature importance
given the estimator.

Discussion
----------

Issues where some discussions related to feature importance has been discussed:
:issue:`20059`, :issue:`21170`.

In SHAP package [2]_, the API is similar to the proposal 2. A class `Explainer`
takes a model, an algorithm, and some additional parameters (that could be
used by some algorithm). The computation of the Shapley values is done and
return using the method `shap_values`.

References and Footnotes
------------------------

.. [1] Each SLEP must either be explicitly labeled as placed in the public
domain (see this SLEP as an example) or licensed under the `Open Publication
License`_.
.. [2] https://shap.readthedocs.io/en/latest/

.. _Open Publication License: https://www.opencontent.org/openpub/


Copyright
---------

This document has been placed in the public domain [1]_.