You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rscore <- auroc$predicted_calibrated # vector of already calibrated model probabilities
53
+
truth <- as.numeric(auroc$actual) # vector of 0s or 1s
54
+
```
55
+
56
+
42
57
# Predictiveness curves
43
58
44
59
Predictiveness curves are an insightful visualization to assess the inherent ability of prognostic models to provide predictions to individual patients. Cumulative versions of predictiveness curves represent positive predictive values (PPV) and 1 - negative predictive values (1 - NPV) and are also informative if the eventual goal is to use a cutoff for clinical decision making.
45
60
46
61
You can use `riskProfile` function to visualize and assess all these quantities.
47
62
48
-
Here is an example:
63
+
Note that method "asis" (below on the graphs) means that the score (or model probabilities in our case) are taken as is,
p <- riskProfile(outcome = truth, score = rscore, include = "PC")
61
73
p$plot
62
74
```
63
75
64
-
```{r}
65
-
head(p$data)
76
+
Ideally, all subjects in the population that have the condition (=> prevalence) are marked as having the condition (predicted risk = 1) and all subjects without the condition (=> 1 - prevalence) are marked as not having the condition (predicted risk = 0).
77
+
This implies that the ideal predictiveness curve is 0 for all subjects not having the condition,
78
+
and then it steps (jumps) at `1 - prevalence` to 1 for all the subjects having the condition (see the gray line).
79
+
80
+
In reality, the curves are not step functions. The more flat the curves get, the less discrimination, and therefore utility, there is in the model.
81
+
82
+
One can also investigate the tails of predictiveness curve. The model is more useful if these regions have very low or very high predicted risks relatively to the rest of the data.
83
+
84
+
85
+
## Positive / Negative predictive values
86
+
87
+
Now let's plot PPV and 1-NPV:
88
+
89
+
```{r fig.height=5, fig.width=8}
90
+
p <- riskProfile(outcome = truth, score = rscore, include = c("PPV", "1-NPV"))
91
+
p$plot
66
92
```
67
93
68
-
There is an extensive documentation of this function with examples if you run `?riskProfile`.
94
+
Again, in an ideal case, they both are as close to the gray lines as possible.
69
95
70
-
To briefly highlight the functionalities:
96
+
In an ideal scenario:
71
97
72
-
- Use the `methods` argument to specify the desired estimation method (see the last section) or use `"asis"` for no estimation.
98
+
- in terms of PPV: If all the subjects with the condition are predicted perfectly, then PPV = TP / PP = 1 (TP = true positive, PP = predicted positive).
99
+
Hence, all the subjects with the condition must be higher than `1 - prevalence` on the prediction percentile for PPV = 1.
73
100
74
-
- You can adjust the prevalence by setting `prev.adj` to the desired amount.
101
+
- in terms of 1-NPV: If all the subjects without the condition are predicted perfectly, then NPV = TN / PN = 1 (TN = true negative, PN = predicted negative).
102
+
Hence, all the subjects without the condition must be lower than `1 - prevalence` on the risk percentile for 1-NPV = 0.
75
103
76
-
-`show.nonparam.pv` controls whether to show/hide the non-parametric estimation of PPV, 1-NPV, and NPV.
77
104
78
-
-`show.best.pv` controls whether to show/hide the theoretically best possible PPV, 1-NPV, NPV.
105
+
## Output settings
79
106
80
-
- Use `include` argument to specify what quantities to show:
107
+
Note that:
81
108
82
-
- PC = predictiveness curve
109
+
- most importantly, in case of a biomarker or if the model probabilities are not calibrated well, you can use a smoother, see `methods` argument and the last section of the vignette.
83
110
84
-
- PPV = positive predictive value
111
+
- the prevalence can be adjusted by setting it in `prev.adj`.
85
112
86
-
-NPV = negative predictive value
113
+
- you can also plot "NPV" by adjusting the `include` parameter.
87
114
88
-
- 1-NPV = 1 - negative predictive value
115
+
- you can also access the underlying data:
89
116
90
-
-`plot.raw` sets whether to plot raw values or percentiles.
117
+
```{r}
118
+
p <- riskProfile(outcome = truth, score = rscore, include = c("PPV", "1-NPV"))
119
+
head(p$data)
120
+
```
91
121
92
-
-`rev.order` sets whether to reverse the order of the score (useful if higher score refers to lower outcome).
93
122
94
-
- The output is the plot itself and the underlying data.
In an ideal scenario, the fitted curves should be identical with the identity line.
136
+
137
+
In reality, the closer they are to the identity line, the better.
138
+
139
+
Note that you can also quantify calibration through discrimination and miscalibration index,
140
+
see this [blog post](https://stats4datascience.com/posts/three_metrics_binary/)
141
+
and [modsculpt](https://github.com/Genentech/modsculpt)
142
+
R package (metrics functions).
143
+
110
144
111
-
There is an extensive documentation of this function with examples if you run `?calibrationProfile`.
145
+
## Output settings
112
146
113
-
To briefly highlight the functionalities:
147
+
Note that:
114
148
115
-
- Use the `methods` argument to specify the desired estimation method (see the last section). In this case, `"asis"` is not allowed.
149
+
-in case of a biomarker or if the model probabilities are not calibrated well, you can use a smoother, see `methods` argument and the last section of the vignette. In this case, `"asis"` is not allowed.
116
150
117
-
- Use`include` argument to specify what additional quantities to show:
151
+
-use`include` argument to specify what additional quantities to show:
118
152
119
-
-`"loess"`: Adds non-parametric Loess fit.
153
+
-`"loess"`: Adds non-parametric Loess fit.
120
154
121
-
-`"citl"`: Adds "Calibration in the Large", an overall mean of outcome and score.
155
+
-`"citl"`: Adds "Calibration in the Large", an overall mean of outcome and score.
122
156
123
-
-`"rug"`: Adds "rug", i.e. ticks on x-axis showing the individual data points (top axis shows score for outcome == 1, bottom axis shows score for outcome == 0).
157
+
-`"rug"`: Adds "rug", i.e. ticks on x-axis showing the individual data points (top axis shows score for outcome == 1, bottom axis shows score for outcome == 0).
124
158
125
-
-`"datapoints"`: Similar to rug, just shows jittered points instead of ticks.
159
+
-`"datapoints"`: Similar to rug, just shows jittered points instead of ticks.
126
160
127
-
-`plot.raw` sets whether to plot raw values or percentiles.
161
+
-use `margin.type` to add a marginal plot through `ggExtra::ggMarginal`. You can select one of `c("density", "histogram", "boxplot", "violin", "densigram")`. It adds the selected 1d graph on top of the calibrtion plot and is suitable for investigating the score.
128
162
129
-
-`rev.order` sets whether to reverse the order of the score (useful if higher score refers to lower outcome).
163
+
-you can again access the underlying data with `p$data`.
130
164
131
-
- Use `margin.type` to add a marginal plot through `ggExtra::ggMarginal`. You can select one of `c("density", "histogram", "boxplot", "violin", "densigram")`. It adds the selected 1d graph on top of the calibrtion plot and is suitable for investigating the score.
132
165
133
-
- The output is the plot itself and the underlying data.
134
166
135
167
# Sensitivity and specificity
136
168
137
-
Ultimately, we provide a sensitivity and specificity plot. For these quantities you need to define a cutoff with which you can trasnform the numeric score to binary. We use data-driven cutoffs, meaning that every single value of score is taken as the cutoff, allowing us to visualize the sensitivity and specificity as a function of score. This graph may inform you of the best suitable cutoff for your model, although we usually recommend to output the whole score range, not just the binary decisions.
169
+
Ultimately, we provide a sensitivity and specificity plot as a function of score (threshold is data-driven). This graph may inform you of the best suitable cutoff for your model, although we usually recommend to output the whole score range, not just the binary decisions.
138
170
139
171
You can use `sensSpec` function to visualize and assess sensitivity and specificity.
Again, the ideal scenario would be having a model following the gray lines.
179
+
Since there is a trade-off between sensitivity and specificity, the graph may guide you which threshold (or thresholds) to choose, depending if one is more important than the other.
149
180
150
-
There is an extensive documentation of this function with examples if you run `?sensSpec`.
151
181
152
-
To briefly highlight the functionalities:
182
+
## Output settings
153
183
154
-
- Use the `methods` argument to specify the desired estimation method (see the last section) or use `"asis"` for no estimation.
184
+
Note that:
155
185
156
-
-`show.best` controls whether to show/hide the theoretically best possible sensitivity and specificity.
186
+
- in case of a biomarker or if the model probabilities are not calibrated well, you can use a smoother, see `methods` argument and the last section of the vignette.
187
+
188
+
- you can again access the underlying data with `p$data`.
157
189
158
-
-`plot.raw` sets whether to plot raw values or percentiles.
159
190
160
-
-`rev.order` sets whether to reverse the order of the score (useful if higher score refers to lower outcome).
161
191
162
192
# Adjusting the graphs
163
193
@@ -188,6 +218,7 @@ p$plot +
188
218
189
219
Otherwise, you can use the `$data` element to construct your own graph as well.
190
220
221
+
191
222
# Estimations in stats4phc
192
223
193
224
For all the plotting functions from this package, there is a possibility to define an estimation function, which will be applied on the given score. In `calibrationProfile`, this serves as a calibration curve. In `riskProfile`, this smooths the given score. All of this is always driven by the `methods` argument, which is available in each of the plotting functions.
0 commit comments