From 7aabb2a3f3ffffc3c936c4cedb59665397164f0c Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 09:42:04 +0100
Subject: [PATCH 01/11] Initial barebone draft

---
 slep021/proposal.rst | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
 create mode 100644 slep021/proposal.rst

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
new file mode 100644
index 0000000..e2dad71
--- /dev/null
+++ b/slep021/proposal.rst
@@ -0,0 +1,36 @@
+.. _slep_021:
+
+==================================================
+SLEP021: Unified API to compute feature importance
+==================================================
+
+:Author: Thomas J Fan, Guillaume Lemaitre
+:Status: Draft
+:Type: Standards Track
+:Created: 2023-03-09
+
+Abstract
+--------
+
+Detailed description
+--------------------
+
+Discussion
+----------
+
+References and Footnotes
+------------------------
+
+.. [1] Each SLEP must either be explicitly labeled as placed in the public
+   domain (see this SLEP as an example) or licensed under the `Open Publication
+   License`_.
+.. [2] `scikit-learn Governance and Decision-Making
+   <https://scikit-learn.org/stable/governance.html#decision-making-process>`__
+
+.. _Open Publication License: https://www.opencontent.org/openpub/
+
+
+Copyright
+---------
+
+This document has been placed in the public domain. [1]_

From e753bad9a5e55d404f98f377c56ff2c3f07dd12c Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 10:46:26 +0100
Subject: [PATCH 02/11] draft motivation

---
 slep021/proposal.rst | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index e2dad71..f3579e3 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -12,9 +12,44 @@ SLEP021: Unified API to compute feature importance
 Abstract
 --------
 
+This SLEP proposes a common API for computing feature importance.
+
 Detailed description
 --------------------
 
+Motivation
+~~~~~~~~~~
+
+Data scientists rely on feature importance when inspecting a trained model.
+Feature importance is a measure of how much a feature contributes to the
+prediction and thus gives insights on the model the predictions provided by
+the model.
+
+However, there is currently not a single method to compute feature importance.
+All available methods are designed upon axioms or hypotheses that are not
+necessarly respected in practice.
+
+Some work in scikit-learn has been done to provide documentation to highlight
+the limitations of some implemented methods. However, there is currently not
+a common way to expose feature importance in scikit-learn. In addition, for
+some historical reasons, some estimators (e.g. decision tree) provide a single
+feature importance that could be used as the "method-to-use" to analyse the
+model. It is problematic since there is not defacto standard to analyse the
+feature importance of a model.
+
+Therefore, this SLEP proposes an API for providing feature importance allowing
+to be flexible to switch between methods and extensible to add new methods. It
+is a follow-up of initial discussions from [2]_.
+
+Current state
+~~~~~~~~~~~~~
+
+Current pitfalls
+~~~~~~~~~~~~~~~~
+
+Solution
+~~~~~~~~
+
 Discussion
 ----------
 
@@ -24,8 +59,7 @@ References and Footnotes
 .. [1] Each SLEP must either be explicitly labeled as placed in the public
    domain (see this SLEP as an example) or licensed under the `Open Publication
    License`_.
-.. [2] `scikit-learn Governance and Decision-Making
-   <https://scikit-learn.org/stable/governance.html#decision-making-process>`__
+.. [2]
 
 .. _Open Publication License: https://www.opencontent.org/openpub/
 
@@ -33,4 +67,4 @@ References and Footnotes
 Copyright
 ---------
 
-This document has been placed in the public domain. [1]_
+This document has been placed in the public domain [1]_.

From 93dcd2f99220b5f4faabae83286c1cbd559ea49e Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 14:33:16 +0100
Subject: [PATCH 03/11] add link

---
 slep021/proposal.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index f3579e3..1d675e2 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -59,7 +59,7 @@ References and Footnotes
 .. [1] Each SLEP must either be explicitly labeled as placed in the public
    domain (see this SLEP as an example) or licensed under the `Open Publication
    License`_.
-.. [2]
+.. [2] https://github.com/scikit-learn/scikit-learn/pull/25659#pullrequestreview-1330861709
 
 .. _Open Publication License: https://www.opencontent.org/openpub/
 

From bcf37e738b8d90ced457012adf044f0268ee3236 Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 14:34:36 +0100
Subject: [PATCH 04/11] wrong link

---
 slep021/proposal.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index 1d675e2..d4d642b 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -59,7 +59,7 @@ References and Footnotes
 .. [1] Each SLEP must either be explicitly labeled as placed in the public
    domain (see this SLEP as an example) or licensed under the `Open Publication
    License`_.
-.. [2] https://github.com/scikit-learn/scikit-learn/pull/25659#pullrequestreview-1330861709
+.. [2] https://github.com/scikit-learn/scikit-learn/issues/20059#issuecomment-869811256
 
 .. _Open Publication License: https://www.opencontent.org/openpub/
 

From 0459715f019b9c8f76cf3a9ca038c6db6c7d1823 Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 16:45:12 +0100
Subject: [PATCH 05/11] add available methods and first use case

---
 index.rst            |  1 +
 slep021/proposal.rst | 41 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/index.rst b/index.rst
index ff7d43c..2a9ee3d 100644
--- a/index.rst
+++ b/index.rst
@@ -25,6 +25,7 @@
     slep012/proposal
     slep017/proposal
     slep019/proposal
+    slep021/proposal
 
 .. toctree::
     :maxdepth: 1
diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index d4d642b..fe350b5 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -39,11 +39,46 @@ feature importance of a model.
 
 Therefore, this SLEP proposes an API for providing feature importance allowing
 to be flexible to switch between methods and extensible to add new methods. It
-is a follow-up of initial discussions from [2]_.
+is a follow-up of initial discussions from :issue:`20059`.
 
 Current state
 ~~~~~~~~~~~~~
 
+Available methods
+^^^^^^^^^^^^^^^^^
+
+The following methods are available in scikit-learn to provide some feature
+importance:
+
+- The function :func:`sklearn.inspection.permutation_importance`. It requests
+  a fitted estimator and a dataset. Addtional parameters can be provided. The
+  method returns a `Bunch` containing 3 attributes: all decrease in score for
+  all repeatitions, the mean, and the standard deviation across the repeats.
+  This method is therefore estimator agnostic.
+- The linear estimators have a `coef_` attributes once fitted.
+- The decision tree-based estimators have a `feature_importances_` attribute
+  once fitted.
+
+Use cases
+^^^^^^^^^
+
+The first usage of feature importance is to inspect a fitted model. Usually,
+the feature importance will be plotted to visualize the importance of the
+features::
+
+   >>> tree = DecisionTreeClassifier().fit(X_train, y_train)
+   >>> plt.barh(X_train.columns, tree.feature_importances_)
+
+The analysis can be done further by checking the variance of the feature
+importance. :func:`sklearn.inspection.permutation_importance` already provides
+a way to do that since it repeats the computation several time. For the model
+specific feature importance, the user can use cross-validation to get an idea
+of the dispersion::
+
+   >>> cv_results = cross_validate(tree, X_train, y_train, return_estimator=True)
+   >>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]]
+   >>> plt.boxplot(feature_importances, labels=X_train.columns)
+
 Current pitfalls
 ~~~~~~~~~~~~~~~~
 
@@ -53,13 +88,15 @@ Solution
 Discussion
 ----------
 
+Issues where some discussions related to feature importance has been discussed:
+:issue:`20059`, :issue:`21170`.
+
 References and Footnotes
 ------------------------
 
 .. [1] Each SLEP must either be explicitly labeled as placed in the public
    domain (see this SLEP as an example) or licensed under the `Open Publication
    License`_.
-.. [2] https://github.com/scikit-learn/scikit-learn/issues/20059#issuecomment-869811256
 
 .. _Open Publication License: https://www.opencontent.org/openpub/
 

From 341c79a4bd1dede62a75f433f62c9002b699f531 Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Thu, 9 Mar 2023 17:09:16 +0100
Subject: [PATCH 06/11] iter

---
 slep021/proposal.rst | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index fe350b5..6419a3c 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -79,12 +79,49 @@ of the dispersion::
    >>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]]
    >>> plt.boxplot(feature_importances, labels=X_train.columns)
 
+The second usage is about the model selection. Meta-estimator such as
+:class:`sklearn.model_selection.SelectFromModel` internally use an array of
+length `(n_features,)` to select feature and retrain a model on this subset of
+feature.
+
+By default :class:`sklearn.model_selection.SelectFromModel` relies on the
+estimator to expose `coef_` or `feature_importances_`::
+
+   >>> SelectFromModel(tree).fit(X_train, y_train)  # `tree` exposes `feature_importances_`
+
+For more flexbilibity, a string can be provided::
+
+   >>> linear_model = make_pipeline(StandardScaler(), LogisticRegression())
+   >>> SelectFromModel(
+   ...     linear_model, importance_getter="named_steps.logisticregression.coef_"
+   ... ).fit(X_train, y_train)
+
+:class:`sklearn.model_selection.SelectFromModel` rely by default on
+the estimator to expose a `coef_` or `feature_importances_` attribute. It is
+also possible to provide a string corresponding the attribute name returning
+the feature importance. It allows to deal with estimator embedded inside a
+pipeline, for instance. Finally, a callable taking an estimator and returning
+a NumPy array can also be provided.
+
 Current pitfalls
 ~~~~~~~~~~~~~~~~
 
 Solution
 ~~~~~~~~
 
+Plotting
+~~~~~~~~
+
+Add a new :class:`sklearn.inspection.FeatureImportanceDisplay` class to
+:mod:`sklearn.inspection`. Two methods could be useful for this display: (i)
+:meth:`sklearn.inspection.FeatureImportanceDisplay.from_estimator` to plot a
+a single estimate of feature importance and (ii)
+:meth:`sklearn.inspection.FeatureImportanceDisplay.from_cv_results` to plot a
+an estimate of the feature importance together with the variance.
+
+The display should therefore be aware how to retrieve the feature importance
+given the esimator.
+
 Discussion
 ----------
 

From f95a7a3d483e9ea2f8fe3b6ed7dac0bfa8c9d124 Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Fri, 10 Mar 2023 14:48:38 +0100
Subject: [PATCH 07/11] iter

---
 slep021/proposal.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index 6419a3c..7127b50 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -110,7 +110,7 @@ Solution
 ~~~~~~~~
 
 Plotting
-~~~~~~~~
+^^^^^^^^
 
 Add a new :class:`sklearn.inspection.FeatureImportanceDisplay` class to
 :mod:`sklearn.inspection`. Two methods could be useful for this display: (i)
@@ -120,7 +120,7 @@ a single estimate of feature importance and (ii)
 an estimate of the feature importance together with the variance.
 
 The display should therefore be aware how to retrieve the feature importance
-given the esimator.
+given the estimator.
 
 Discussion
 ----------

From 7f1fcb507666f92af1903dd5dd70e99320f9cffd Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Fri, 10 Mar 2023 15:02:28 +0100
Subject: [PATCH 08/11] iter

---
 slep021/proposal.rst | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index 7127b50..eed0c95 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -89,7 +89,7 @@ estimator to expose `coef_` or `feature_importances_`::
 
    >>> SelectFromModel(tree).fit(X_train, y_train)  # `tree` exposes `feature_importances_`
 
-For more flexbilibity, a string can be provided::
+For more flexibility, a string can be provided::
 
    >>> linear_model = make_pipeline(StandardScaler(), LogisticRegression())
    >>> SelectFromModel(
@@ -106,6 +106,22 @@ a NumPy array can also be provided.
 Current pitfalls
 ~~~~~~~~~~~~~~~~
 
+On a methodological perspective, scikit-learn does not encourage for good
+practice. Indeed, since it provides a defacto `feature_importances_` attribute
+for the decision tree, it is tempting for user to believe that this method is
+the best one.
+
+In the same spirit, the :class:`sklearn.model_selection.SelectFromModel`
+meta-estimator uses defacto the `feature_importances_` or `coef_` for selecting
+features.
+
+In both cases, it should be better to request the user to be more explicit and
+request to choose a specific method to compute the feature importance for
+inspection or feature selection.
+
+On an API perspective, the current functionality for feature importance are
+available via functions or attributes, with no common API.
+
 Solution
 ~~~~~~~~
 

From 487e7620d4f12e6cec690bfdf9adf815b9aa5a8a Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Fri, 10 Mar 2023 15:39:41 +0100
Subject: [PATCH 09/11] iter

---
 slep021/proposal.rst | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index eed0c95..eb0b28f 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -119,12 +119,38 @@ In both cases, it should be better to request the user to be more explicit and
 request to choose a specific method to compute the feature importance for
 inspection or feature selection.
 
+Additionally, `feature_importances_` and `coef_` are statistics derived from
+the training set. We already documented that the reported
+`feature_importances_` will potentially show biases for features used by the
+model to overfit. Thus, it will potentially negatively impact the feature
+selection once used in the :class:`sklearn.model_selection.SelectFromModel`
+meta-estimator.
+
 On an API perspective, the current functionality for feature importance are
 available via functions or attributes, with no common API.
 
 Solution
 ~~~~~~~~
 
+A common API
+^^^^^^^^^^^^
+
+**Proposal 1**: Expose a parameter in `__init__` to select the method to use
+to compute the feature importance. The computation will be done using a method,
+e.g. `get_feature_importance` that could take additional parameters requested
+by the feature importance method. This method could therefore be used
+internally by :class:`sklearn.model_selection.SelectFromModel`.
+
+**Proposal 2**: Create a meta-estimator that takes a model and a method in
+`__init__`. Then, a method `fit` could compute the feature importance given
+some data. Then, the feature importance could be available through a fitted
+attribute `feature_importances_` (or a method?). We could reuse such
+meta-estimator in the :class:`sklearn.model_selection.SelectFromModel`.
+
+Then, we should rely on a common API for the methods computing the feature
+importance. It seems that they should all at least accept a fitted estimator,
+some dataset, and potentially some extra parameters.
+
 Plotting
 ^^^^^^^^
 
@@ -144,12 +170,18 @@ Discussion
 Issues where some discussions related to feature importance has been discussed:
 :issue:`20059`, :issue:`21170`.
 
+In SHAP package [2]_, the API is similar to the proposal 2. A class `Explainer`
+takes a model, an algorithm, and some additional parameters (that could be
+used by some algorithm). The computation of the Shapley values is done and
+return using the method `shap_values`.
+
 References and Footnotes
 ------------------------
 
 .. [1] Each SLEP must either be explicitly labeled as placed in the public
    domain (see this SLEP as an example) or licensed under the `Open Publication
    License`_.
+.. [2] https://shap.readthedocs.io/en/latest/
 
 .. _Open Publication License: https://www.opencontent.org/openpub/
 

From 71fed85d5480d28b2c15d27669c927adb996a0cf Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Date: Wed, 15 Mar 2023 16:01:11 +0100
Subject: [PATCH 10/11] some rephresaing and new proposal

---
 slep021/proposal.rst | 48 ++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index eb0b28f..a11a72b 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -21,21 +21,16 @@ Motivation
 ~~~~~~~~~~
 
 Data scientists rely on feature importance when inspecting a trained model.
-Feature importance is a measure of how much a feature contributes to the
-prediction and thus gives insights on the model the predictions provided by
-the model.
-
-However, there is currently not a single method to compute feature importance.
-All available methods are designed upon axioms or hypotheses that are not
-necessarly respected in practice.
-
-Some work in scikit-learn has been done to provide documentation to highlight
-the limitations of some implemented methods. However, there is currently not
-a common way to expose feature importance in scikit-learn. In addition, for
-some historical reasons, some estimators (e.g. decision tree) provide a single
-feature importance that could be used as the "method-to-use" to analyse the
-model. It is problematic since there is not defacto standard to analyse the
-feature importance of a model.
+However, there is currently not a single algorithm providing **the** feature
+importance. In practice, several algorithms are available, all having their
+pros and cons.
+
+In scikit-learn, there are different ways to compute and inspect feature
+importances. Some models, e.g. some tree based models, expose a
+`feature_importances_` attribute upon `fit`, and we also have utilities such as
+the `permutation_importance` function to compute a different type of feature
+importance. There has been some work documenting their limitations, but we have
+not provided a nice API to implement alternatives.
 
 Therefore, this SLEP proposes an API for providing feature importance allowing
 to be flexible to switch between methods and extensible to add new methods. It
@@ -55,7 +50,9 @@ importance:
   method returns a `Bunch` containing 3 attributes: all decrease in score for
   all repeatitions, the mean, and the standard deviation across the repeats.
   This method is therefore estimator agnostic.
-- The linear estimators have a `coef_` attributes once fitted.
+- The linear estimators have a `coef_` attributes once fitted, which is
+  sometimes used their corresponding importance. We documented the limitations
+  when it comes to interpret those coefficients.
 - The decision tree-based estimators have a `feature_importances_` attribute
   once fitted.
 
@@ -79,7 +76,7 @@ of the dispersion::
    >>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]]
    >>> plt.boxplot(feature_importances, labels=X_train.columns)
 
-The second usage is about the model selection. Meta-estimator such as
+The second usage is about the feature selection. Meta-estimator such as
 :class:`sklearn.model_selection.SelectFromModel` internally use an array of
 length `(n_features,)` to select feature and retrain a model on this subset of
 feature.
@@ -144,13 +141,26 @@ internally by :class:`sklearn.model_selection.SelectFromModel`.
 **Proposal 2**: Create a meta-estimator that takes a model and a method in
 `__init__`. Then, a method `fit` could compute the feature importance given
 some data. Then, the feature importance could be available through a fitted
-attribute `feature_importances_` (or a method?). We could reuse such
-meta-estimator in the :class:`sklearn.model_selection.SelectFromModel`.
+attribute `feature_importances_` or a method `get_feature_importance`. We could
+reuse such meta-estimator in the
+:class:`sklearn.model_selection.SelectFromModel`.
 
 Then, we should rely on a common API for the methods computing the feature
 importance. It seems that they should all at least accept a fitted estimator,
 some dataset, and potentially some extra parameters.
 
+**Proposal 3**: Similarly to the proposal 2 and taking inspiration from the
+SHAP package [2]_, we could create a class `Explainer` providing a
+`get_feature_importance` method given some data.
+
+Currently scikit-learn provides only global feature importance. The previous
+API could be extended by providing a `get_samples_importance` to compute an
+explanation per sample if the given method supports it (e.g. Shapley values).
+
+**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that
+could be passed around plotting displays or to an
+`estimator.get_feature_importance` method.
+
 Plotting
 ^^^^^^^^
 

From 934904c8d8907a8f04c8bfc9af9513f8187c7fcf Mon Sep 17 00:00:00 2001
From: Guillaume Lemaitre <guillaume@probabl.ai>
Date: Thu, 16 May 2024 22:05:46 +0200
Subject: [PATCH 11/11] add related method for feature importances

---
 slep021/proposal.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/slep021/proposal.rst b/slep021/proposal.rst
index a11a72b..8824ed6 100644
--- a/slep021/proposal.rst
+++ b/slep021/proposal.rst
@@ -185,6 +185,18 @@ takes a model, an algorithm, and some additional parameters (that could be
 used by some algorithm). The computation of the Shapley values is done and
 return using the method `shap_values`.
 
+Related issues
+--------------
+
+Some discussions happened in the past. In this section, we aggregate all issues
+related to this topic:
+
+- :issue:`15132`: proposal to add `feature_importances_` into the
+  `HistGradientBoosting` classifier and regressor models.
+- :issue:`18223`: proposal to implement the PIMP feature importance.
+- :issue:`18603`: implement OOB permutation importance for `RandomForest`.
+- :issue:`21170`: implement variable importances for linear models.
+
 References and Footnotes
 ------------------------