Skip to content

Commit 53a3bf9

Browse files
committed
linewrapping
1 parent 6dfa27b commit 53a3bf9

File tree

1 file changed

+63
-22
lines changed

1 file changed

+63
-22
lines changed

README.Rmd

+63-22
Original file line numberDiff line numberDiff line change
@@ -105,28 +105,40 @@ pak::pkg_install("cmu-delphi/epipredict@main")
105105
# Dev version
106106
pak::pkg_install("cmu-delphi/epipredict@dev")
107107
```
108-
The documentation for the stable version is at <https://cmu-delphi.github.io/epipredict>, while the development version is at <https://cmu-delphi.github.io/epipredict/dev>.
108+
109+
The documentation for the stable version is at
110+
<https://cmu-delphi.github.io/epipredict>, while the development version is at
111+
<https://cmu-delphi.github.io/epipredict/dev>.
109112

110113

111114
## Motivating example
112115

113-
To demonstrate the kind of forecast epipredict can make, say we're predicting COVID deaths per 100k for each state on
116+
To demonstrate the kind of forecast epipredict can make, say we're predicting
117+
COVID deaths per 100k for each state on
118+
114119
```{r fc_date}
115120
forecast_date <- as.Date("2021-08-01")
116121
```
117-
Below the fold, we construct this dataset as an `epiprocess::epi_df` from JHU data.
122+
123+
Below the fold, we construct this dataset as an `epiprocess::epi_df` from JHU
124+
data.
118125

119126
<details>
120127
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>
121-
This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate some of the typically ubiquitous cleaning operations needed to be able to forecast.
122-
First we pull both jhu-csse cases and deaths from [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
123-
```{r case_death}
128+
129+
This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate
130+
some of the typically ubiquitous cleaning operations needed to be able to
131+
forecast.
132+
First we pull both jhu-csse cases and deaths from
133+
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
134+
135+
```{r case_death, warning = FALSE}
124136
cases <- pub_covidcast(
125137
source = "jhu-csse",
126138
signals = "confirmed_incidence_prop",
127139
time_type = "day",
128140
geo_type = "state",
129-
time_values = epirange(20200601, 20220101),
141+
time_values = epirange(20200601, 20211231),
130142
geo_values = "*"
131143
) |>
132144
select(geo_value, time_value, case_rate = value)
@@ -136,7 +148,7 @@ deaths <- pub_covidcast(
136148
signals = "deaths_incidence_prop",
137149
time_type = "day",
138150
geo_type = "state",
139-
time_values = epirange(20200601, 20220101),
151+
time_values = epirange(20200601, 20211231),
140152
geo_values = "*"
141153
) |>
142154
select(geo_value, time_value, death_rate = value)
@@ -154,10 +166,16 @@ cases_deaths |>
154166
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
155167
theme(axis.text.x = element_text(angle = 90, hjust = 1))
156168
```
157-
As with basically any dataset, there is some cleaning that we will need to do to make it actually usable; we'll use some utilities from [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
158-
First, to eliminate some of the noise coming from daily reporting, we do 7 day averaging over a trailing window[^1]:
159169

160-
[^1]: This makes it so that any given day of the processed timeseries only depends on the previous week, which means that we avoid leaking future values when making a forecast.
170+
As with basically any dataset, there is some cleaning that we will need to do to
171+
make it actually usable; we'll use some utilities from
172+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this. First, to
173+
eliminate some of the noise coming from daily reporting, we do 7 day averaging
174+
over a trailing window[^1]:
175+
176+
[^1]: This makes it so that any given day of the processed timeseries only
177+
depends on the previous week, which means that we avoid leaking future
178+
values when making a forecast.
161179

162180
```{r smooth}
163181
cases_deaths <-
@@ -174,13 +192,18 @@ cases_deaths <-
174192
```
175193

176194
Then trimming outliers, most especially negative values:
195+
177196
```{r outlier}
178197
cases_deaths <-
179198
cases_deaths |>
180199
group_by(geo_value) |>
181200
mutate(
182-
outlr_death_rate = detect_outlr_rm(time_value, death_rate, detect_negatives = TRUE),
183-
outlr_case_rate = detect_outlr_rm(time_value, case_rate, detect_negatives = TRUE)
201+
outlr_death_rate = detect_outlr_rm(
202+
time_value, death_rate, detect_negatives = TRUE
203+
),
204+
outlr_case_rate = detect_outlr_rm(
205+
time_value, case_rate, detect_negatives = TRUE
206+
)
184207
) |>
185208
unnest(cols = starts_with("outlr"), names_sep = "_") |>
186209
ungroup() |>
@@ -215,7 +238,9 @@ processed_data_plot <-
215238
facet_grid(source ~ geo_value, scale = "free") +
216239
geom_vline(aes(xintercept = forecast_date)) +
217240
geom_text(
218-
data = forecast_date_label, aes(x = dates, label = "forecast\ndate", y = heights), size = 3, hjust = "right"
241+
data = forecast_date_label,
242+
aes(x = dates, label = "forecast\ndate", y = heights),
243+
size = 3, hjust = "right"
219244
) +
220245
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
221246
theme(axis.text.x = element_text(angle = 90, hjust = 1))
@@ -225,7 +250,9 @@ processed_data_plot <-
225250
processed_data_plot
226251
```
227252

228-
To make a forecast, we will use a "canned" simple auto-regressive forecaster to predict the death rate four weeks into the future using lagged[^3] deaths and cases
253+
To make a forecast, we will use a "canned" simple auto-regressive forecaster to
254+
predict the death rate four weeks into the future using lagged[^3] deaths and
255+
cases
229256

230257
[^3]: lagged by 3 in this context meaning using the value from 3 days ago.
231258

@@ -251,11 +278,16 @@ predicted values (and prediction intervals) for each location 28 days after the
251278
forecast date.
252279
Plotting the prediction intervals on our subset above[^2]:
253280

254-
[^2]: Alternatively, you could call `auto_plot(four_week_ahead)` to get the full collection of forecasts. This is too busy for the space we have for plotting here.
281+
[^2]: Alternatively, you could call `auto_plot(four_week_ahead)` to get the full
282+
collection of forecasts. This is too busy for the space we have for plotting
283+
here.
255284

256285
<details>
257286
<summary> Plot </summary>
258-
This is the same kind of plot as `processed_data_plot` above, but with the past data narrowed somewhat
287+
288+
This is the same kind of plot as `processed_data_plot` above, but with the past
289+
data narrowed somewhat
290+
259291
```{r}
260292
narrow_data_plot <-
261293
cases_deaths |>
@@ -267,13 +299,17 @@ narrow_data_plot <-
267299
facet_grid(source ~ geo_value, scale = "free") +
268300
geom_vline(aes(xintercept = forecast_date)) +
269301
geom_text(
270-
data = forecast_date_label, aes(x = dates, label = "forecast\ndate", y = heights), size = 3, hjust = "right"
302+
data = forecast_date_label,
303+
aes(x = dates, label = "forecast\ndate", y = heights),
304+
size = 3, hjust = "right"
271305
) +
272306
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
273307
theme(axis.text.x = element_text(angle = 90, hjust = 1))
274308
```
275309

276-
Putting that together with a plot of the bands, and a plot of the median prediction.
310+
Putting that together with a plot of the bands, and a plot of the median
311+
prediction.
312+
277313
```{r plotting_forecast, warning=FALSE}
278314
epiworkflow <- four_week_ahead$epi_workflow
279315
restricted_predictions <-
@@ -288,15 +324,20 @@ forecast_plot <-
288324
levels = 0.9,
289325
fill = primary
290326
) +
291-
geom_point(data = restricted_predictions, aes(y = .data$value), color = secondary)
327+
geom_point(data = restricted_predictions,
328+
aes(y = .data$value),
329+
color = secondary)
292330
```
293331
</details>
294332

295333
```{r show-single-forecast, warning=FALSE, echo=FALSE}
296334
forecast_plot
297335
```
298-
The yellow dot gives the median prediction, while the red interval gives the 5-95% inter-quantile range.
299-
For this particular day and these locations, the forecasts are relatively accurate, with the true data being within the 25-75% interval.
336+
337+
The yellow dot gives the median prediction, while the red interval gives the
338+
5-95% inter-quantile range.
339+
For this particular day and these locations, the forecasts are relatively
340+
accurate, with the true data being within the 25-75% interval.
300341
A couple of things to note:
301342

302343
1. Our methods are primarily direct forecasters; this means we don't need to

0 commit comments

Comments
 (0)