You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.Rmd
+123-34
Original file line number
Diff line number
Diff line change
@@ -40,20 +40,46 @@ You can view documentation for the `main` branch at <https://cmu-delphi.github.i
40
40
41
41
## Goals for `epipredict`
42
42
43
-
**We hope to provide:**
44
-
45
-
1. A set of basic, easy-to-use forecasters that work out of the box. You should be able to do a reasonably limited amount of customization on them. For the basic forecasters, we currently provide:
46
-
* Baseline flatline forecaster
47
-
* Autoregressive forecaster
48
-
* Autoregressive classifier
49
-
* CDC FluSight flatline forecaster
50
-
2. A framework for creating custom forecasters out of modular components. There are four types of components:
51
-
* Preprocessor: do things to the data before model training
52
-
* Trainer: train a model on data, resulting in a fitted model object
53
-
* Predictor: make predictions, using a fitted model object
54
-
* Postprocessor: do things to the predictions before returning
43
+
<details>
44
+
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>
45
+
This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate some of the typically ubiquitous cleaning operations needed to be able to forecast.
46
+
First we pull both jhu-csse cases and deaths from [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
47
+
```{r case_death}
48
+
cases <- pub_covidcast(
49
+
source = "jhu-csse",
50
+
signals = "confirmed_incidence_prop",
51
+
time_type = "day",
52
+
geo_type = "state",
53
+
time_values = epirange(20200601, 20220101),
54
+
geo_values = "*") |>
55
+
select(geo_value, time_value, case_rate = value)
56
+
57
+
deaths <- pub_covidcast(
58
+
source = "jhu-csse",
59
+
signals = "deaths_incidence_prop",
60
+
time_type = "day",
61
+
geo_type = "state",
62
+
time_values = epirange(20200601, 20220101),
63
+
geo_values = "*") |>
64
+
select(geo_value, time_value, death_rate = value)
65
+
cases_deaths <-
66
+
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
As with basically any dataset, there is some cleaning that we will need to do to make it actually usable; we'll use some utilities from [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
80
+
First, to eliminate some of the noise coming from daily reporting, we do 7 day averaging over a trailing window[^1]:
55
81
56
-
**Target audiences:**
82
+
[^1]: This makes it so that any given day of the processed timeseries only depends on the previous week, which means that we avoid leaking future values when making a forecast.
57
83
58
84
* Basic. Has data, calls forecaster with default arguments.
59
85
* Intermediate. Wants to examine changes to the arguments, take advantage of
@@ -86,6 +112,41 @@ covid_case_death_rates
86
112
87
113
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
88
114
115
+
After having downloaded and cleaned the data in `cases_deaths`, we plot a subset
To make a forecast, we will use a "canned" simple auto-regressive forecaster to predict the death rate four weeks into the future using lagged[^3] deaths and cases
147
+
148
+
[^3]: lagged by 3 in this context meaning using the value from 3 days ago.
In this case, we have used a number of different lags for the case rate, while
103
-
only using 3 weekly lags for the death rate (as predictors). The result is both
104
-
a fitted model object which could be used any time in the future to create
105
-
different forecasts, as well as a set of predicted values (and prediction
106
-
intervals) for each location 14 days after the last available time value in the
107
-
data.
108
-
109
-
```{r print-model}
110
-
two_week_ahead$epi_workflow
163
+
In this case, we have used 0-3 days, a week, and two week lags for the case
164
+
rate, while using only zero, one and two weekly lags for the death rate (as
165
+
predictors).
166
+
The result `four_week_ahead` is both a fitted model object which could be used
167
+
any time in the future to create different forecasts, as well as a set of
168
+
predicted values (and prediction intervals) for each location 28 days after the
169
+
forecast date.
170
+
Plotting the prediction intervals on our subset above[^2]:
171
+
172
+
[^2]: Alternatively, you could call `auto_plot(four_week_ahead)` to get the full collection of forecasts. This is too busy for the space we have for plotting here.
173
+
174
+
<details>
175
+
<summary> Plot </summary>
176
+
This is the same kind of plot as `processed_data_plot` above, but with the past data narrowed somewhat
0 commit comments