@@ -126,9 +126,9 @@ data.
126
126
<details >
127
127
<summary > Creating the dataset using `{epidatr}` and `{epiprocess}` </summary >
128
128
129
- This dataset can be found in the package as <TODO DOESN'T EXIST> ; we demonstrate
130
- some of the typically ubiquitous cleaning operations needed to be able to
131
- forecast.
129
+ This dataset can be found in the package as ` covid_case_death_rates ` ; we
130
+ demonstrate some of the typically ubiquitous cleaning operations needed to be
131
+ able to forecast.
132
132
First we pull both jhu-csse cases and deaths from
133
133
[ ` {epidatr} ` ] ( https://cmu-delphi.github.io/epidatr/ ) package:
134
134
@@ -152,26 +152,34 @@ deaths <- pub_covidcast(
152
152
geo_values = "*"
153
153
) |>
154
154
select(geo_value, time_value, death_rate = value)
155
+ ```
156
+
157
+ Since visualizing the results on every geography is somewhat overwhelming,
158
+ we'll only train on a subset of 5.
159
+ ``` {r date, warning = FALSE}
160
+ used_locations <- c("ca", "ma", "ny", "tx")
155
161
cases_deaths <-
156
162
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
163
+ filter(geo_value %in% used_locations) |>
157
164
as_epi_df(as_of = as.Date("2022-01-01"))
158
- plot_locations <- c("ca", "ma", "ny", "tx")
159
165
# plotting the data as it was downloaded
160
166
cases_deaths |>
161
- filter(geo_value %in% plot_locations) |>
162
- pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
163
- ggplot(aes(x = time_value, y = value)) +
164
- geom_line() +
165
- facet_grid(source ~ geo_value, scale = "free") +
167
+ autoplot(
168
+ case_rate,
169
+ death_rate,
170
+ .color_by = "none"
171
+ ) +
172
+ facet_grid(.response_name ~ geo_value, scale = "free") +
166
173
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
167
174
theme(axis.text.x = element_text(angle = 90, hjust = 1))
168
175
```
169
176
170
177
As with basically any dataset, there is some cleaning that we will need to do to
171
178
make it actually usable; we'll use some utilities from
172
- [ ` {epiprocess} ` ] ( https://cmu-delphi.github.io/epiprocess/ ) for this. First, to
173
- eliminate some of the noise coming from daily reporting, we do 7 day averaging
174
- over a trailing window[ ^ 1 ] :
179
+ [ ` {epiprocess} ` ] ( https://cmu-delphi.github.io/epiprocess/ ) for this.
180
+
181
+ First, to eliminate some of the noise coming from daily reporting, we do 7 day
182
+ averaging over a trailing window[ ^ 1 ] :
175
183
176
184
[ ^ 1 ] : This makes it so that any given day of the processed timeseries only
177
185
depends on the previous week, which means that we avoid leaking future
@@ -199,10 +207,12 @@ cases_deaths <-
199
207
group_by(geo_value) |>
200
208
mutate(
201
209
outlr_death_rate = detect_outlr_rm(
202
- time_value, death_rate, detect_negatives = TRUE
210
+ time_value, death_rate,
211
+ detect_negatives = TRUE
203
212
),
204
213
outlr_case_rate = detect_outlr_rm(
205
- time_value, case_rate, detect_negatives = TRUE
214
+ time_value, case_rate,
215
+ detect_negatives = TRUE
206
216
)
207
217
) |>
208
218
unnest(cols = starts_with("outlr"), names_sep = "_") |>
@@ -212,7 +222,6 @@ cases_deaths <-
212
222
case_rate = outlr_case_rate_replacement
213
223
) |>
214
224
select(geo_value, time_value, case_rate, death_rate)
215
- cases_deaths
216
225
```
217
226
</details >
218
227
@@ -224,14 +233,13 @@ of the states, noting the actual forecast date:
224
233
``` {r plot_locs}
225
234
forecast_date_label <-
226
235
tibble(
227
- geo_value = rep(plot_locations , 2),
228
- source = c(rep("case_rate", 4), rep("death_rate", 4)),
229
- dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations )),
236
+ geo_value = rep(used_locations , 2),
237
+ .response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
238
+ dates = rep(forecast_date - 7 * 2, 2 * length(used_locations )),
230
239
heights = c(rep(150, 4), rep(1.0, 4))
231
240
)
232
241
processed_data_plot <-
233
242
cases_deaths |>
234
- filter(geo_value %in% plot_locations) |>
235
243
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
236
244
ggplot(aes(x = time_value, y = value)) +
237
245
geom_line() +
@@ -292,36 +300,37 @@ data narrowed somewhat
292
300
narrow_data_plot <-
293
301
cases_deaths |>
294
302
filter(time_value > "2021-04-01") |>
295
- filter(geo_value %in% plot_locations) |>
296
- pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
297
- ggplot(aes(x = time_value, y = value)) +
298
- geom_line() +
299
- facet_grid(source ~ geo_value, scale = "free") +
303
+ autoplot(
304
+ case_rate,
305
+ death_rate,
306
+ .color_by = "none"
307
+ ) +
308
+ facet_grid(.response_name ~ geo_value, scale = "free") +
300
309
geom_vline(aes(xintercept = forecast_date)) +
301
310
geom_text(
302
311
data = forecast_date_label,
303
312
aes(x = dates, label = "forecast\ndate", y = heights),
304
313
size = 3, hjust = "right"
305
314
) +
306
315
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
307
- theme(axis.text.x = element_text(angle = 90, hjust = 1))
316
+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
317
+ ylim(0, NA)
308
318
```
309
319
310
320
Putting that together with a plot of the bands, and a plot of the median
311
321
prediction.
312
322
313
323
``` {r plotting_forecast, warning=FALSE}
314
324
epiworkflow <- four_week_ahead$epi_workflow
325
+
315
326
restricted_predictions <-
316
327
four_week_ahead$predictions |>
317
- filter(geo_value %in% plot_locations) |>
318
328
rename(time_value = target_date, value = .pred) |>
319
- mutate(source = "death_rate")
329
+ mutate(.response_name = "death_rate")
320
330
forecast_plot <-
321
331
narrow_data_plot |>
322
332
epipredict:::plot_bands(
323
- restricted_predictions,
324
- levels = 0.9
333
+ restricted_predictions
325
334
) +
326
335
geom_point(
327
336
data = restricted_predictions,
@@ -351,5 +360,6 @@ A couple of things to note:
351
360
If you encounter a bug or have a feature request, feel free to file an [ issue on
352
361
our github page] ( https://github.com/cmu-delphi/epipredict/issues ) .
353
362
For other
354
- questions, feel free to contact
[ Daniel
] ( [email protected] ) ,
[ David
] ( [email protected] ) ,
[ Dmitry
] ( [email protected] ) , or
355
- [ Logan
] ( [email protected] ) , either via email or on the Insightnet slack.
363
+ questions, feel free to reach out to the authors, either via this [ contact
364
+ form] ( https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform ) ,
365
+ email or the Insightnet slack.
0 commit comments