Skip to content

Commit 8044b98

Browse files
committed
using new autoplot
1 parent 30eb7f8 commit 8044b98

File tree

4 files changed

+70
-123
lines changed

4 files changed

+70
-123
lines changed

README.Rmd

+19-44
Original file line numberDiff line numberDiff line change
@@ -240,10 +240,12 @@ forecast_date_label <-
240240
)
241241
processed_data_plot <-
242242
cases_deaths |>
243-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
244-
ggplot(aes(x = time_value, y = value)) +
245-
geom_line() +
246-
facet_grid(source ~ geo_value, scale = "free") +
243+
autoplot(
244+
case_rate,
245+
death_rate,
246+
.color_by = "none"
247+
) +
248+
facet_grid(.response_name ~ geo_value, scale = "free") +
247249
geom_vline(aes(xintercept = forecast_date)) +
248250
geom_text(
249251
data = forecast_date_label,
@@ -292,61 +294,34 @@ Plotting the prediction intervals on our subset above[^2]:
292294

293295
<details>
294296
<summary> Plot </summary>
295-
296-
This is the same kind of plot as `processed_data_plot` above, but with the past
297-
data narrowed somewhat
298-
299-
```{r}
300-
narrow_data_plot <-
301-
cases_deaths |>
302-
filter(time_value > "2021-04-01") |>
303-
autoplot(
304-
case_rate,
305-
death_rate,
306-
.color_by = "none"
307-
) +
308-
facet_grid(.response_name ~ geo_value, scale = "free") +
309-
geom_vline(aes(xintercept = forecast_date)) +
310-
geom_text(
311-
data = forecast_date_label,
312-
aes(x = dates, label = "forecast\ndate", y = heights),
313-
size = 3, hjust = "right"
314-
) +
315-
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
316-
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
317-
ylim(0, NA)
318-
```
319-
320-
Putting that together with a plot of the bands, and a plot of the median
321-
prediction.
322-
323297
```{r plotting_forecast, warning=FALSE}
324298
epiworkflow <- four_week_ahead$epi_workflow
325-
326299
restricted_predictions <-
327300
four_week_ahead$predictions |>
328301
rename(time_value = target_date, value = .pred) |>
329302
mutate(.response_name = "death_rate")
330303
forecast_plot <-
331-
narrow_data_plot |>
332-
epipredict:::plot_bands(
333-
restricted_predictions
304+
four_week_ahead |>
305+
autoplot(plot_data = cases_deaths) +
306+
geom_vline(aes(xintercept = forecast_date)) +
307+
geom_text(
308+
data = forecast_date_label %>% filter(.response_name == "death_rate"),
309+
aes(x = dates, label = "forecast\ndate", y = heights),
310+
size = 3, hjust = "right"
334311
) +
335-
geom_point(
336-
data = restricted_predictions,
337-
aes(y = .data$value)
338-
)
312+
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
313+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
339314
```
340315
</details>
341316

342317
```{r show-single-forecast, warning=FALSE, echo=FALSE}
343318
forecast_plot
344319
```
345320

346-
The yellow dot gives the median prediction, while the red interval gives the
347-
5-95% inter-quantile range.
321+
The black dot gives the median prediction, while the blue intervals give the
322+
25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges.
348323
For this particular day and these locations, the forecasts are relatively
349-
accurate, with the true data being within the 25-75% interval.
324+
accurate, with the true data being at least within the 10-90% interval.
350325
A couple of things to note:
351326

352327
1. Our methods are primarily direct forecasters; this means we don't need to
@@ -362,4 +337,4 @@ our github page](https://github.com/cmu-delphi/epipredict/issues).
362337
For other
363338
questions, feel free to reach out to the authors, either via this [contact
364339
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
365-
email or the Insightnet slack.
340+
email, or the Insightnet slack.

README.md

+51-79
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ from JHU data.
5959
Creating the dataset using `{epidatr}` and `{epiprocess}`
6060
</summary>
6161

62-
This dataset can be found in the package as \<TODO DOESN’T EXIST\>; we
62+
This dataset can be found in the package as `covid_case_death_rates`; we
6363
demonstrate some of the typically ubiquitous cleaning operations needed
6464
to be able to forecast. First we pull both jhu-csse cases and deaths
6565
from [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
@@ -84,26 +84,35 @@ deaths <- pub_covidcast(
8484
geo_values = "*"
8585
) |>
8686
select(geo_value, time_value, death_rate = value)
87+
```
88+
89+
Since visualizing the results on every geography is somewhat
90+
overwhelming, we’ll only train on a subset of 5.
91+
92+
``` r
93+
used_locations <- c("ca", "ma", "ny", "tx")
8794
cases_deaths <-
8895
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
96+
filter(geo_value %in% used_locations) |>
8997
as_epi_df(as_of = as.Date("2022-01-01"))
90-
plot_locations <- c("ca", "ma", "ny", "tx")
9198
# plotting the data as it was downloaded
9299
cases_deaths |>
93-
filter(geo_value %in% plot_locations) |>
94-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
95-
ggplot(aes(x = time_value, y = value)) +
96-
geom_line() +
97-
facet_grid(source ~ geo_value, scale = "free") +
100+
autoplot(
101+
case_rate,
102+
death_rate,
103+
.color_by = "none"
104+
) +
105+
facet_grid(.response_name ~ geo_value, scale = "free") +
98106
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
99107
theme(axis.text.x = element_text(angle = 90, hjust = 1))
100108
```
101109

102-
<img src="man/figures/README-case_death-1.png" width="90%" style="display: block; margin: auto;" />
110+
<img src="man/figures/README-date-1.png" width="90%" style="display: block; margin: auto;" />
103111

104112
As with basically any dataset, there is some cleaning that we will need
105113
to do to make it actually usable; we’ll use some utilities from
106114
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
115+
107116
First, to eliminate some of the noise coming from daily reporting, we do
108117
7 day averaging over a trailing window[^1]:
109118

@@ -129,10 +138,12 @@ cases_deaths <-
129138
group_by(geo_value) |>
130139
mutate(
131140
outlr_death_rate = detect_outlr_rm(
132-
time_value, death_rate, detect_negatives = TRUE
141+
time_value, death_rate,
142+
detect_negatives = TRUE
133143
),
134144
outlr_case_rate = detect_outlr_rm(
135-
time_value, case_rate, detect_negatives = TRUE
145+
time_value, case_rate,
146+
detect_negatives = TRUE
136147
)
137148
) |>
138149
unnest(cols = starts_with("outlr"), names_sep = "_") |>
@@ -142,22 +153,6 @@ cases_deaths <-
142153
case_rate = outlr_case_rate_replacement
143154
) |>
144155
select(geo_value, time_value, case_rate, death_rate)
145-
cases_deaths
146-
#> An `epi_df` object, 32,424 x 4 with metadata:
147-
#> * geo_type = state
148-
#> * time_type = day
149-
#> * as_of = 2022-01-01
150-
#>
151-
#> # A tibble: 32,424 × 4
152-
#> geo_value time_value case_rate death_rate
153-
#> * <chr> <date> <dbl> <dbl>
154-
#> 1 ak 2020-06-01 2.31 0
155-
#> 2 ak 2020-06-02 1.94 0
156-
#> 3 ak 2020-06-03 2.63 0
157-
#> 4 ak 2020-06-04 2.59 0
158-
#> 5 ak 2020-06-05 2.43 0
159-
#> 6 ak 2020-06-06 2.35 0
160-
#> # ℹ 32,418 more rows
161156
```
162157

163158
</details>
@@ -173,18 +168,19 @@ Plot
173168
``` r
174169
forecast_date_label <-
175170
tibble(
176-
geo_value = rep(plot_locations, 2),
177-
source = c(rep("case_rate", 4), rep("death_rate", 4)),
178-
dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations)),
171+
geo_value = rep(used_locations, 2),
172+
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
173+
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
179174
heights = c(rep(150, 4), rep(1.0, 4))
180175
)
181176
processed_data_plot <-
182177
cases_deaths |>
183-
filter(geo_value %in% plot_locations) |>
184-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
185-
ggplot(aes(x = time_value, y = value)) +
186-
geom_line() +
187-
facet_grid(source ~ geo_value, scale = "free") +
178+
autoplot(
179+
case_rate,
180+
death_rate,
181+
.color_by = "none"
182+
) +
183+
facet_grid(.response_name ~ geo_value, scale = "free") +
188184
geom_vline(aes(xintercept = forecast_date)) +
189185
geom_text(
190186
data = forecast_date_label,
@@ -216,7 +212,7 @@ four_week_ahead <- arx_forecaster(
216212
four_week_ahead
217213
#> ══ A basic forecaster of type ARX Forecaster ════════════════════════════════
218214
#>
219-
#> This forecaster was fit on 2025-01-24 15:31:46.
215+
#> This forecaster was fit on 2025-01-27 16:36:10.
220216
#>
221217
#> Training data was an <epi_df> with:
222218
#> • Geography: state,
@@ -226,8 +222,8 @@ four_week_ahead
226222
#>
227223
#> ── Predictions ──────────────────────────────────────────────────────────────
228224
#>
229-
#> A total of 56 predictions are available for
230-
#> • 56 unique geographic regions,
225+
#> A total of 4 predictions are available for
226+
#> • 4 unique geographic regions,
231227
#> • At forecast date: 2021-08-01,
232228
#> • For target date: 2021-08-29,
233229
#>
@@ -246,58 +242,34 @@ Plotting the prediction intervals on our subset above[^3]:
246242
Plot
247243
</summary>
248244

249-
This is the same kind of plot as `processed_data_plot` above, but with
250-
the past data narrowed somewhat
251-
252245
``` r
253-
narrow_data_plot <-
254-
cases_deaths |>
255-
filter(time_value > "2021-04-01") |>
256-
filter(geo_value %in% plot_locations) |>
257-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
258-
ggplot(aes(x = time_value, y = value)) +
259-
geom_line() +
260-
facet_grid(source ~ geo_value, scale = "free") +
246+
epiworkflow <- four_week_ahead$epi_workflow
247+
restricted_predictions <-
248+
four_week_ahead$predictions |>
249+
rename(time_value = target_date, value = .pred) |>
250+
mutate(.response_name = "death_rate")
251+
forecast_plot <-
252+
four_week_ahead |>
253+
autoplot(plot_data = cases_deaths) +
261254
geom_vline(aes(xintercept = forecast_date)) +
262255
geom_text(
263-
data = forecast_date_label,
256+
data = forecast_date_label %>% filter(.response_name == "death_rate"),
264257
aes(x = dates, label = "forecast\ndate", y = heights),
265258
size = 3, hjust = "right"
266259
) +
267260
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
268261
theme(axis.text.x = element_text(angle = 90, hjust = 1))
269262
```
270263

271-
Putting that together with a plot of the bands, and a plot of the median
272-
prediction.
273-
274-
``` r
275-
epiworkflow <- four_week_ahead$epi_workflow
276-
restricted_predictions <-
277-
four_week_ahead$predictions |>
278-
filter(geo_value %in% plot_locations) |>
279-
rename(time_value = target_date, value = .pred) |>
280-
mutate(source = "death_rate")
281-
forecast_plot <-
282-
narrow_data_plot |>
283-
epipredict:::plot_bands(
284-
restricted_predictions,
285-
levels = 0.9
286-
) +
287-
geom_point(
288-
data = restricted_predictions,
289-
aes(y = .data$value)
290-
)
291-
```
292-
293264
</details>
294265

295266
<img src="man/figures/README-show-single-forecast-1.png" width="90%" style="display: block; margin: auto;" />
296267

297-
The yellow dot gives the median prediction, while the red interval gives
298-
the 5-95% inter-quantile range. For this particular day and these
299-
locations, the forecasts are relatively accurate, with the true data
300-
being within the 25-75% interval. A couple of things to note:
268+
The black dot gives the median prediction, while the blue intervals give
269+
the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges. For this
270+
particular day and these locations, the forecasts are relatively
271+
accurate, with the true data being within the 25-75% interval. A couple
272+
of things to note:
301273

302274
1. Our methods are primarily direct forecasters; this means we don’t
303275
need to predict 1, 2,…, 27 days ahead to then predict 28 days ahead
@@ -312,10 +284,10 @@ being within the 25-75% interval. A couple of things to note:
312284
If you encounter a bug or have a feature request, feel free to file an
313285
[issue on our github
314286
page](https://github.com/cmu-delphi/epipredict/issues). For other
315-
questions, feel free to contact [Daniel]([email protected]),
316-
[David]([email protected]), [Dmitry]([email protected]), or
317-
[Logan]([email protected]), either via email or on the Insightnet
318-
slack.
287+
questions, feel free to reach out to the authors, either via this
288+
[contact
289+
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
290+
email, or the Insightnet slack.
319291

320292
[^1]: This makes it so that any given day of the processed timeseries
321293
only depends on the previous week, which means that we avoid leaking
12.8 KB
Loading
26.9 KB
Loading

0 commit comments

Comments
 (0)