|
1 | 1 | # Partial-dependence Profiles
|
2 | 2 |
|
3 |
| -**Learning objectives:** |
| 3 | +**Sections:** |
4 | 4 |
|
5 |
| -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY |
| 5 | +- Overview\ |
| 6 | +- Intuition\ |
| 7 | +- Method\ |
| 8 | +- Feature Importance\ |
| 9 | +- Example Apartment Prices\ |
| 10 | +- Pros and Cons\ |
| 11 | +- R Examples |
6 | 12 |
|
7 |
| -## SLIDE 1 {-} |
| 13 | +## Overview {.unnumbered} |
8 | 14 |
|
9 |
| -- ADD SLIDES AS SECTIONS (`##`). |
10 |
| -- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. |
| 15 | +- This chapter focuses on partial dependence plots (PDP), also referred to as partial- dependence (PD) profiles.\ |
| 16 | +- Extremely popular technique as global model explainer\ |
| 17 | +- Available in multiple packages such as `DALEX`, `iml`, and `pdp`, and `PDPbox`\ |
| 18 | +- Core idea: PDP plots show how model predictions change as a function of an explanatory variable\ |
| 19 | +- Can be produced used on all observations or focusing on key subsets\ |
| 20 | +- Useful for comparing multiple models: |
| 21 | + - Agreement between profiles establishes confidence |
| 22 | + - Disagreement may suggest a way to improve a model\ |
| 23 | + - Evaluation of model performance at boundaries |
11 | 24 |
|
12 |
| -## Meeting Videos {-} |
| 25 | +## Intuition {.unnumbered} |
13 | 26 |
|
14 |
| -### Cohort 1 {-} |
| 27 | +- PD profile is constructed by taking arithmetic average of individual ceteris-paribus (CP) profiles. |
| 28 | +- From previous chapter, recall that CP profiles show instance level model prediction behavior based on varying values of an explanatory variable |
| 29 | +- When model is additive, CP profiles for all instances are parallel with the same shape. |
| 30 | +- If model includes interactions, CP profiles may not be parallel |
| 31 | + |
| 32 | +*Example plots using rf model on titanic dataset*  |
| 33 | + |
| 34 | +Note: A chart with all CP profiles plotted together is referred to as an individual conditional explanation (ICE) plot. |
| 35 | + |
| 36 | +## Method {.unnumbered} |
| 37 | + |
| 38 | +### Basic Equations {.unnumbered} |
| 39 | + |
| 40 | +Mathematical representation of PD profile value for model *f()*, variable *j* at value *z*: $$g_{PD}^{j}(z) = E_{\underline{X}^{-j}}\{f(X^{j|=z})\}$$ |
| 41 | + |
| 42 | +where $\underline{X}^{-j}$ refers to joint distribution of all explanatory variables other than $X^J$ |
| 43 | + |
| 44 | +We rarely know true distribution of $\underline{X}^{-j}$, so we typically estimate using the empirical distribution in our training data: |
| 45 | + |
| 46 | +$$\hat g_{PD}^{j}(z) = \frac{1}{n} \sum_{i=1}^{n} f(\underline{x}_i^{j|=z}).$$ |
| 47 | + |
| 48 | +The above equation refers to the mean of CP profiles for $X^J$ |
| 49 | + |
| 50 | +### Clustered partial-dependence profiles {.unnumbered} |
| 51 | + |
| 52 | +- Mean of CP profiles might not be a good representation if profiles are not parallel.\ |
| 53 | +- Alternative approach would be to create multiple clusters of CP profiles: |
| 54 | + - Use K-means or hierarchical clustering to identify clusters\ |
| 55 | + - Can use Euclidean distance between CP profiles for identifying similar instances |
| 56 | + |
| 57 | +*Example clustered PDP using rf model on titanic dataset*  |
| 58 | + |
| 59 | +### Grouped partial-dependence profiles {.unnumbered} |
| 60 | + |
| 61 | +- We can use grouped PDPs if we can explicitly identify features that influence the shape of the CP profile for the explanatory variable of interest |
| 62 | +- Obvious use case is when model includes interaction between variable of interest and another one. |
| 63 | + |
| 64 | +*Example grouped PDP using rf model on titanic dataset*  |
| 65 | + |
| 66 | +### Contrastive partial-dependence profiles {-} |
| 67 | + |
| 68 | +We can plot PD profiles for multiple models together on same chart. |
| 69 | + |
| 70 | +*Example grouped PDP using rf model on titanic dataset*  |
| 71 | + |
| 72 | +## Feature Importance {-} |
| 73 | + |
| 74 | +This section references Section 8.1.1 of Christopher Molnar's [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/pdp.html) book |
| 75 | + |
| 76 | +Can measure partial dependence-based feature importance as follows: |
| 77 | + |
| 78 | +$$I(x_S) = \sqrt{\frac{1}{K-1}\sum_{k=1}^K(\hat{f}_S(x^{(k)}_S) - \frac{1}{K}\sum_{k=1}^K \hat{f}_S({x^{(k)}_S))^2}}$$ |
| 79 | +where $x^{(k)}_S$ are K unique values of feature $X_S$ |
| 80 | + |
| 81 | +Formula calculates variation of PD profile values around average PD value. |
| 82 | + |
| 83 | +Main idea: A flat PD profile indicates a feature that is not important. |
| 84 | + |
| 85 | +Limitations: |
| 86 | + |
| 87 | +- Only captures main effects, ignores feature interactions |
| 88 | +- Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance. |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | + |
| 94 | + |
| 95 | +## Example: apartment-prices data {.unnumbered} |
| 96 | + |
| 97 | +In this section, we use a random forest model to predict price per square meter for an apartment. Focus on two variables, *surface* and *construction year* |
| 98 | + |
| 99 | +### Partial-dependence-profiles {-} |
| 100 | + |
| 101 | + |
| 102 | +### Clustered partial-dependence profiles {-} |
| 103 | + |
| 104 | + |
| 105 | +### Grouped partial-dependence profiles {-} |
| 106 | + |
| 107 | +In this example, we use *district* as grouping variable to see if relationship between model's prediction with construction year and surface is similar in different geographic areas. |
| 108 | + |
| 109 | +\ |
| 110 | + |
| 111 | +### Constrastive partial-dependence profiles {-} |
| 112 | + |
| 113 | + |
| 114 | +Below we compare the PDP output for from two predictive models: a basic linear regression model and the random forest model. |
| 115 | + |
| 116 | + |
| 117 | + |
| 118 | +## Pros and Cons {.unnumbered} |
| 119 | + |
| 120 | +| Pros | Cons | |
| 121 | +|------------------------------|------------------------------------------| |
| 122 | +| Popular, well-understood in DS community | Maximum number of features in plot is two | |
| 123 | +| Simple, intuitive way to summarize effect of feature on target variable | Assumption of independence; problematic with correlated explanatory variables | |
| 124 | +| Multiple software packages in R and Python; also easy to implement from scratch | Heterogeneous effects may be hidden in basic PDP plot; may need grouped or clustered profiles for better insight | |
| 125 | +| Can be used to assess variable importance | Can be time-consuming to run for medium and large datasets; likely need to use samplse | |
| 126 | +| Calculation has a casual interpretation for the model of interest. | Can be misleading in areas where data are sparse in the training sample | |
| 127 | + |
| 128 | + |
| 129 | +**Note:** |
| 130 | + |
| 131 | +- Pros/Cons above reference both the EMA text as well as Christopher Molnar's Interpretable ML book: <https://christophm.github.io/interpretable-ml-book/pdp.html> |
| 132 | + |
| 133 | +## Code Snippets in R {.unnumbered} |
| 134 | + |
| 135 | +We use the titanic dataset and a random forest model. |
| 136 | + |
| 137 | +```{r 17-load-data, warning=FALSE, message=FALSE, results='hide'} |
| 138 | +library("DALEX") |
| 139 | +library("randomForest") |
| 140 | +titanic_imputed <- archivist::aread("pbiecek/models/27e5c") |
| 141 | +titanic_rf <- archivist::aread("pbiecek/models/4e0fc") |
| 142 | +explainer_rf <- DALEX::explain(model = titanic_rf, |
| 143 | + data = titanic_imputed[, -9], |
| 144 | + y = titanic_imputed$survived, |
| 145 | + label = "Random Forest") |
| 146 | +``` |
| 147 | + |
| 148 | +### Partial-dependence profiles {-} |
| 149 | + |
| 150 | +```{r pdp, warning=FALSE, message=FALSE} |
| 151 | +library("ggplot2") |
| 152 | +pdp_rf <- model_profile(explainer = explainer_rf, variables = "age") |
| 153 | +plot(pdp_rf) + ggtitle("Partial-dependence profile for age") |
| 154 | +
|
| 155 | +``` |
| 156 | + |
| 157 | +- Only need to supplier explainer and variable arguments |
| 158 | +- Optional argument `N` allows you to vary the sample size used for calculation, default is 100 |
| 159 | +- We can specify specific grouping variables if creating grouped PDP |
| 160 | +- We can also create clustered PDP by specifying the `k` argument for number of clusters. Uses hierarchical clustering under the hood. |
| 161 | + |
| 162 | +We can include CP profiles (i.e. an ICE plot) with an additional argument to `plot()`: |
| 163 | + |
| 164 | +```{r pdp-ice} |
| 165 | +plot(pdp_rf, geom = "profiles") + |
| 166 | + ggtitle("Ceteris-paribus and partial-dependence profiles for age") |
| 167 | +
|
| 168 | +``` |
| 169 | + |
| 170 | +### Clustered partial-dependence profiles {.unnumbered} |
| 171 | + |
| 172 | +This uses `hclust()` function: |
| 173 | + |
| 174 | +```{r clustered-pdp, warning=FALSE,message=FALSE} |
| 175 | +pdp_rf_clust <- model_profile(explainer = explainer_rf, |
| 176 | + variables = "age", k = 3) |
| 177 | +
|
| 178 | +plot(pdp_rf_clust, geom = "profiles") + |
| 179 | + ggtitle("Clustered partial-dependence profiles for age") |
| 180 | +
|
| 181 | +``` |
| 182 | + |
| 183 | +Grouped partial dependence profiles {-} |
| 184 | + |
| 185 | +Below we group by `gender`: |
| 186 | + |
| 187 | +```{r grouped-pdp, message=FALSE, warning=FALSE} |
| 188 | +pdp_rf_gender <- model_profile(explainer = explainer_rf, |
| 189 | + variables = "age", groups = "gender") |
| 190 | +
|
| 191 | +plot(pdp_rf_gender, geom = "profiles") + |
| 192 | + ggtitle("Partial-dependence profiles for age, grouped by gender") |
| 193 | +
|
| 194 | +``` |
| 195 | + |
| 196 | +### Contrastive partial-dependence profiles {.unnumbered} |
| 197 | + |
| 198 | +```{r contrastive-pdp, warning=FALSE,message=FALSE,results='hide'} |
| 199 | +library("rms") |
| 200 | +titanic_lmr <- archivist::aread("pbiecek/models/58b24") |
| 201 | +explainer_lmr <- DALEX::explain(model = titanic_lmr, |
| 202 | + data = titanic_imputed[, -9], |
| 203 | + y = titanic_imputed$survived, |
| 204 | + label = "Logistic Regression") |
| 205 | +
|
| 206 | +pdp_lmr <- model_profile(explainer = explainer_lmr, variables = "age") |
| 207 | +pdp_rf <- model_profile(explainer = explainer_rf, variables = "age") |
| 208 | +
|
| 209 | +plot(pdp_rf, pdp_lmr) + |
| 210 | + ggtitle("Partial-dependence profiles for age for two models") |
| 211 | +
|
| 212 | +
|
| 213 | +``` |
| 214 | + |
| 215 | + |
| 216 | +## Meeting Videos {.unnumbered} |
| 217 | + |
| 218 | +### Cohort 1 {.unnumbered} |
15 | 219 |
|
16 | 220 | `r knitr::include_url("https://www.youtube.com/embed/URL")`
|
17 | 221 |
|
18 | 222 | <details>
|
19 |
| -<summary> Meeting chat log </summary> |
20 | 223 |
|
21 |
| -``` |
| 224 | +<summary>Meeting chat log</summary> |
| 225 | + |
| 226 | +``` |
22 | 227 | LOG
|
23 | 228 | ```
|
| 229 | + |
24 | 230 | </details>
|
0 commit comments