Skip to content

Commit 9836aff

Browse files
committed
training on only the shown subset
1 parent 73cbbed commit 9836aff

File tree

1 file changed

+41
-31
lines changed

1 file changed

+41
-31
lines changed

README.Rmd

+41-31
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ data.
126126
<details>
127127
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>
128128

129-
This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate
130-
some of the typically ubiquitous cleaning operations needed to be able to
131-
forecast.
129+
This dataset can be found in the package as `covid_case_death_rates`; we
130+
demonstrate some of the typically ubiquitous cleaning operations needed to be
131+
able to forecast.
132132
First we pull both jhu-csse cases and deaths from
133133
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
134134

@@ -152,26 +152,34 @@ deaths <- pub_covidcast(
152152
geo_values = "*"
153153
) |>
154154
select(geo_value, time_value, death_rate = value)
155+
```
156+
157+
Since visualizing the results on every geography is somewhat overwhelming,
158+
we'll only train on a subset of 5.
159+
```{r date, warning = FALSE}
160+
used_locations <- c("ca", "ma", "ny", "tx")
155161
cases_deaths <-
156162
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
163+
filter(geo_value %in% used_locations) |>
157164
as_epi_df(as_of = as.Date("2022-01-01"))
158-
plot_locations <- c("ca", "ma", "ny", "tx")
159165
# plotting the data as it was downloaded
160166
cases_deaths |>
161-
filter(geo_value %in% plot_locations) |>
162-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
163-
ggplot(aes(x = time_value, y = value)) +
164-
geom_line() +
165-
facet_grid(source ~ geo_value, scale = "free") +
167+
autoplot(
168+
case_rate,
169+
death_rate,
170+
.color_by = "none"
171+
) +
172+
facet_grid(.response_name ~ geo_value, scale = "free") +
166173
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
167174
theme(axis.text.x = element_text(angle = 90, hjust = 1))
168175
```
169176

170177
As with basically any dataset, there is some cleaning that we will need to do to
171178
make it actually usable; we'll use some utilities from
172-
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this. First, to
173-
eliminate some of the noise coming from daily reporting, we do 7 day averaging
174-
over a trailing window[^1]:
179+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
180+
181+
First, to eliminate some of the noise coming from daily reporting, we do 7 day
182+
averaging over a trailing window[^1]:
175183

176184
[^1]: This makes it so that any given day of the processed timeseries only
177185
depends on the previous week, which means that we avoid leaking future
@@ -199,10 +207,12 @@ cases_deaths <-
199207
group_by(geo_value) |>
200208
mutate(
201209
outlr_death_rate = detect_outlr_rm(
202-
time_value, death_rate, detect_negatives = TRUE
210+
time_value, death_rate,
211+
detect_negatives = TRUE
203212
),
204213
outlr_case_rate = detect_outlr_rm(
205-
time_value, case_rate, detect_negatives = TRUE
214+
time_value, case_rate,
215+
detect_negatives = TRUE
206216
)
207217
) |>
208218
unnest(cols = starts_with("outlr"), names_sep = "_") |>
@@ -212,7 +222,6 @@ cases_deaths <-
212222
case_rate = outlr_case_rate_replacement
213223
) |>
214224
select(geo_value, time_value, case_rate, death_rate)
215-
cases_deaths
216225
```
217226
</details>
218227

@@ -224,14 +233,13 @@ of the states, noting the actual forecast date:
224233
```{r plot_locs}
225234
forecast_date_label <-
226235
tibble(
227-
geo_value = rep(plot_locations, 2),
228-
source = c(rep("case_rate", 4), rep("death_rate", 4)),
229-
dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations)),
236+
geo_value = rep(used_locations, 2),
237+
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
238+
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
230239
heights = c(rep(150, 4), rep(1.0, 4))
231240
)
232241
processed_data_plot <-
233242
cases_deaths |>
234-
filter(geo_value %in% plot_locations) |>
235243
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
236244
ggplot(aes(x = time_value, y = value)) +
237245
geom_line() +
@@ -292,36 +300,37 @@ data narrowed somewhat
292300
narrow_data_plot <-
293301
cases_deaths |>
294302
filter(time_value > "2021-04-01") |>
295-
filter(geo_value %in% plot_locations) |>
296-
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
297-
ggplot(aes(x = time_value, y = value)) +
298-
geom_line() +
299-
facet_grid(source ~ geo_value, scale = "free") +
303+
autoplot(
304+
case_rate,
305+
death_rate,
306+
.color_by = "none"
307+
) +
308+
facet_grid(.response_name ~ geo_value, scale = "free") +
300309
geom_vline(aes(xintercept = forecast_date)) +
301310
geom_text(
302311
data = forecast_date_label,
303312
aes(x = dates, label = "forecast\ndate", y = heights),
304313
size = 3, hjust = "right"
305314
) +
306315
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
307-
theme(axis.text.x = element_text(angle = 90, hjust = 1))
316+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
317+
ylim(0, NA)
308318
```
309319

310320
Putting that together with a plot of the bands, and a plot of the median
311321
prediction.
312322

313323
```{r plotting_forecast, warning=FALSE}
314324
epiworkflow <- four_week_ahead$epi_workflow
325+
315326
restricted_predictions <-
316327
four_week_ahead$predictions |>
317-
filter(geo_value %in% plot_locations) |>
318328
rename(time_value = target_date, value = .pred) |>
319-
mutate(source = "death_rate")
329+
mutate(.response_name = "death_rate")
320330
forecast_plot <-
321331
narrow_data_plot |>
322332
epipredict:::plot_bands(
323-
restricted_predictions,
324-
levels = 0.9
333+
restricted_predictions
325334
) +
326335
geom_point(
327336
data = restricted_predictions,
@@ -351,5 +360,6 @@ A couple of things to note:
351360
If you encounter a bug or have a feature request, feel free to file an [issue on
352361
our github page](https://github.com/cmu-delphi/epipredict/issues).
353362
For other
354-
questions, feel free to contact [Daniel]([email protected]), [David]([email protected]), [Dmitry]([email protected]), or
355-
[Logan]([email protected]), either via email or on the Insightnet slack.
363+
questions, feel free to reach out to the authors, either via this [contact
364+
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
365+
email or the Insightnet slack.

0 commit comments

Comments
 (0)