Skip to content

Commit 9b0e405

Browse files
add ch 17 pdp profiles (#22)
* add ch 17 pdp * Rename duplicate chunks. --------- Co-authored-by: Jon Harmon <[email protected]>
1 parent e833505 commit 9b0e405

10 files changed

+216
-10
lines changed

09_local-interpretable-model-agnostic-explanations-lime.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ Solution:
168168
* Examples below use K-LASSO for glass-box model
169169

170170

171-
```{r load-data, warning=FALSE,message=FALSE}
171+
```{r 09-load-data, warning=FALSE,message=FALSE}
172172
# core libraries
173173
library(randomForest)
174174
library(DALEX)

17_partial-dependence-profiles.Rmd

+215-9
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,230 @@
11
# Partial-dependence Profiles
22

3-
**Learning objectives:**
3+
**Sections:**
44

5-
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
5+
- Overview\
6+
- Intuition\
7+
- Method\
8+
- Feature Importance\
9+
- Example Apartment Prices\
10+
- Pros and Cons\
11+
- R Examples
612

7-
## SLIDE 1 {-}
13+
## Overview {.unnumbered}
814

9-
- ADD SLIDES AS SECTIONS (`##`).
10-
- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
15+
- This chapter focuses on partial dependence plots (PDP), also referred to as partial- dependence (PD) profiles.\
16+
- Extremely popular technique as global model explainer\
17+
- Available in multiple packages such as `DALEX`, `iml`, and `pdp`, and `PDPbox`\
18+
- Core idea: PDP plots show how model predictions change as a function of an explanatory variable\
19+
- Can be produced used on all observations or focusing on key subsets\
20+
- Useful for comparing multiple models:
21+
- Agreement between profiles establishes confidence
22+
- Disagreement may suggest a way to improve a model\
23+
- Evaluation of model performance at boundaries
1124

12-
## Meeting Videos {-}
25+
## Intuition {.unnumbered}
1326

14-
### Cohort 1 {-}
27+
- PD profile is constructed by taking arithmetic average of individual ceteris-paribus (CP) profiles.
28+
- From previous chapter, recall that CP profiles show instance level model prediction behavior based on varying values of an explanatory variable
29+
- When model is additive, CP profiles for all instances are parallel with the same shape.
30+
- If model includes interactions, CP profiles may not be parallel
31+
32+
*Example plots using rf model on titanic dataset* ![Source: Figure 17.1](img/17-partial-dependence/overview-plots.png)
33+
34+
Note: A chart with all CP profiles plotted together is referred to as an individual conditional explanation (ICE) plot.
35+
36+
## Method {.unnumbered}
37+
38+
### Basic Equations {.unnumbered}
39+
40+
Mathematical representation of PD profile value for model *f()*, variable *j* at value *z*: $$g_{PD}^{j}(z) = E_{\underline{X}^{-j}}\{f(X^{j|=z})\}$$
41+
42+
where $\underline{X}^{-j}$ refers to joint distribution of all explanatory variables other than $X^J$
43+
44+
We rarely know true distribution of $\underline{X}^{-j}$, so we typically estimate using the empirical distribution in our training data:
45+
46+
$$\hat g_{PD}^{j}(z) = \frac{1}{n} \sum_{i=1}^{n} f(\underline{x}_i^{j|=z}).$$
47+
48+
The above equation refers to the mean of CP profiles for $X^J$
49+
50+
### Clustered partial-dependence profiles {.unnumbered}
51+
52+
- Mean of CP profiles might not be a good representation if profiles are not parallel.\
53+
- Alternative approach would be to create multiple clusters of CP profiles:
54+
- Use K-means or hierarchical clustering to identify clusters\
55+
- Can use Euclidean distance between CP profiles for identifying similar instances
56+
57+
*Example clustered PDP using rf model on titanic dataset* ![Source: Figure 17.2](img/17-partial-dependence/clustered-pdp.png)
58+
59+
### Grouped partial-dependence profiles {.unnumbered}
60+
61+
- We can use grouped PDPs if we can explicitly identify features that influence the shape of the CP profile for the explanatory variable of interest
62+
- Obvious use case is when model includes interaction between variable of interest and another one.
63+
64+
*Example grouped PDP using rf model on titanic dataset* ![Source: Figure 17.3](img/17-partial-dependence/grouped-pdp.png)
65+
66+
### Contrastive partial-dependence profiles {-}
67+
68+
We can plot PD profiles for multiple models together on same chart.
69+
70+
*Example grouped PDP using rf model on titanic dataset* ![Source: Figure 17.4](img/17-partial-dependence/contrastive-pdp.png)
71+
72+
## Feature Importance {-}
73+
74+
This section references Section 8.1.1 of Christopher Molnar's [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/pdp.html) book
75+
76+
Can measure partial dependence-based feature importance as follows:
77+
78+
$$I(x_S) = \sqrt{\frac{1}{K-1}\sum_{k=1}^K(\hat{f}_S(x^{(k)}_S) - \frac{1}{K}\sum_{k=1}^K \hat{f}_S({x^{(k)}_S))^2}}$$
79+
where $x^{(k)}_S$ are K unique values of feature $X_S$
80+
81+
Formula calculates variation of PD profile values around average PD value.
82+
83+
Main idea: A flat PD profile indicates a feature that is not important.
84+
85+
Limitations:
86+
87+
- Only captures main effects, ignores feature interactions
88+
- Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance.
89+
90+
91+
92+
93+
94+
95+
## Example: apartment-prices data {.unnumbered}
96+
97+
In this section, we use a random forest model to predict price per square meter for an apartment. Focus on two variables, *surface* and *construction year*
98+
99+
### Partial-dependence-profiles {-}
100+
![Source: Figure 17.5](img/17-partial-dependence/apt-pdp.png)
101+
102+
### Clustered partial-dependence profiles {-}
103+
![Source: Figure 17.6](img/17-partial-dependence/apt-clustered-pdp.png)
104+
105+
### Grouped partial-dependence profiles {-}
106+
107+
In this example, we use *district* as grouping variable to see if relationship between model's prediction with construction year and surface is similar in different geographic areas.
108+
109+
![Source: Figure 17.7](img/17-partial-dependence/apt-grouped-pdp.png)\
110+
111+
### Constrastive partial-dependence profiles {-}
112+
113+
114+
Below we compare the PDP output for from two predictive models: a basic linear regression model and the random forest model.
115+
116+
![Source: Figure 17.8](img/17-partial-dependence/apt-contrastive-pdp.png)
117+
118+
## Pros and Cons {.unnumbered}
119+
120+
| Pros | Cons |
121+
|------------------------------|------------------------------------------|
122+
| Popular, well-understood in DS community | Maximum number of features in plot is two |
123+
| Simple, intuitive way to summarize effect of feature on target variable | Assumption of independence; problematic with correlated explanatory variables |
124+
| Multiple software packages in R and Python; also easy to implement from scratch | Heterogeneous effects may be hidden in basic PDP plot; may need grouped or clustered profiles for better insight |
125+
| Can be used to assess variable importance | Can be time-consuming to run for medium and large datasets; likely need to use samplse |
126+
| Calculation has a casual interpretation for the model of interest. | Can be misleading in areas where data are sparse in the training sample |
127+
128+
129+
**Note:**
130+
131+
- Pros/Cons above reference both the EMA text as well as Christopher Molnar's Interpretable ML book: <https://christophm.github.io/interpretable-ml-book/pdp.html>
132+
133+
## Code Snippets in R {.unnumbered}
134+
135+
We use the titanic dataset and a random forest model.
136+
137+
```{r 17-load-data, warning=FALSE, message=FALSE, results='hide'}
138+
library("DALEX")
139+
library("randomForest")
140+
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
141+
titanic_rf <- archivist::aread("pbiecek/models/4e0fc")
142+
explainer_rf <- DALEX::explain(model = titanic_rf,
143+
data = titanic_imputed[, -9],
144+
y = titanic_imputed$survived,
145+
label = "Random Forest")
146+
```
147+
148+
### Partial-dependence profiles {-}
149+
150+
```{r pdp, warning=FALSE, message=FALSE}
151+
library("ggplot2")
152+
pdp_rf <- model_profile(explainer = explainer_rf, variables = "age")
153+
plot(pdp_rf) + ggtitle("Partial-dependence profile for age")
154+
155+
```
156+
157+
- Only need to supplier explainer and variable arguments
158+
- Optional argument `N` allows you to vary the sample size used for calculation, default is 100
159+
- We can specify specific grouping variables if creating grouped PDP
160+
- We can also create clustered PDP by specifying the `k` argument for number of clusters. Uses hierarchical clustering under the hood.
161+
162+
We can include CP profiles (i.e. an ICE plot) with an additional argument to `plot()`:
163+
164+
```{r pdp-ice}
165+
plot(pdp_rf, geom = "profiles") +
166+
ggtitle("Ceteris-paribus and partial-dependence profiles for age")
167+
168+
```
169+
170+
### Clustered partial-dependence profiles {.unnumbered}
171+
172+
This uses `hclust()` function:
173+
174+
```{r clustered-pdp, warning=FALSE,message=FALSE}
175+
pdp_rf_clust <- model_profile(explainer = explainer_rf,
176+
variables = "age", k = 3)
177+
178+
plot(pdp_rf_clust, geom = "profiles") +
179+
ggtitle("Clustered partial-dependence profiles for age")
180+
181+
```
182+
183+
Grouped partial dependence profiles {-}
184+
185+
Below we group by `gender`:
186+
187+
```{r grouped-pdp, message=FALSE, warning=FALSE}
188+
pdp_rf_gender <- model_profile(explainer = explainer_rf,
189+
variables = "age", groups = "gender")
190+
191+
plot(pdp_rf_gender, geom = "profiles") +
192+
ggtitle("Partial-dependence profiles for age, grouped by gender")
193+
194+
```
195+
196+
### Contrastive partial-dependence profiles {.unnumbered}
197+
198+
```{r contrastive-pdp, warning=FALSE,message=FALSE,results='hide'}
199+
library("rms")
200+
titanic_lmr <- archivist::aread("pbiecek/models/58b24")
201+
explainer_lmr <- DALEX::explain(model = titanic_lmr,
202+
data = titanic_imputed[, -9],
203+
y = titanic_imputed$survived,
204+
label = "Logistic Regression")
205+
206+
pdp_lmr <- model_profile(explainer = explainer_lmr, variables = "age")
207+
pdp_rf <- model_profile(explainer = explainer_rf, variables = "age")
208+
209+
plot(pdp_rf, pdp_lmr) +
210+
ggtitle("Partial-dependence profiles for age for two models")
211+
212+
213+
```
214+
215+
216+
## Meeting Videos {.unnumbered}
217+
218+
### Cohort 1 {.unnumbered}
15219

16220
`r knitr::include_url("https://www.youtube.com/embed/URL")`
17221

18222
<details>
19-
<summary> Meeting chat log </summary>
20223

21-
```
224+
<summary>Meeting chat log</summary>
225+
226+
```
22227
LOG
23228
```
229+
24230
</details>
Loading
Loading
233 KB
Loading

img/17-partial-dependence/apt-pdp.png

215 KB
Loading
133 KB
Loading
39.9 KB
Loading
117 KB
Loading
150 KB
Loading

0 commit comments

Comments
 (0)