You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
find `Rscript inst/pkgdown-watch.R` helpful to keep a live updating version of the website. Note that you need to have `c("pkgdown", "servr", "devtools", "here", "cli", "fs")` installed.
40
+
41
+
### Index/homepage
42
+
43
+
because we are using an `RMD` to make the index, figures are sometimes not added or updated. When in doubt, run `pkgdown::clean_cache()`, `pgkdown::clean_site()`, and delete the following directories/files (path relative to the project directory):
**Note:** This package is currently in development and may not work as expected. Please file bug reports as issues in this repo, and we will do our best to address them quickly.
77
+
Epipredict is a framework for building transformation and forecasting pipelines
78
+
for epidemiological and other panel time-series datasets.
79
+
In addition to tools for building forecasting pipelines, it contains a number of
80
+
"canned" forecasters meant to run with little modification as an easy way to get
81
+
started forecasting.
82
+
83
+
It is designed to work well with
84
+
[`epiprocess`](https://cmu-delphi.github.io/epiprocess/), a utility for handling
85
+
various time series and geographic processing tools in an epidemiological
86
+
context.
87
+
Both of the packages are meant to work well with the panel data provided by
The documentation for the stable version is at <https://cmu-delphi.github.io/epipredict>, while the development version is at <https://cmu-delphi.github.io/epipredict/dev>.
36
106
37
-
## Documentation
38
-
39
-
You can view documentation for the `main` branch at <https://cmu-delphi.github.io/epipredict>.
40
107
41
-
## Goals for `epipredict`
108
+
## Motivating example
42
109
43
-
**We hope to provide:**
44
-
45
-
1. A set of basic, easy-to-use forecasters that work out of the box. You should be able to do a reasonably limited amount of customization on them. For the basic forecasters, we currently provide:
46
-
* Baseline flatline forecaster
47
-
* Autoregressive forecaster
48
-
* Autoregressive classifier
49
-
* CDC FluSight flatline forecaster
50
-
2. A framework for creating custom forecasters out of modular components. There are four types of components:
51
-
* Preprocessor: do things to the data before model training
52
-
* Trainer: train a model on data, resulting in a fitted model object
53
-
* Predictor: make predictions, using a fitted model object
54
-
* Postprocessor: do things to the predictions before returning
55
-
56
-
**Target audiences:**
110
+
To demonstrate the kind of forecast epipredict can make, say we're predicting COVID deaths per 100k for each state on
111
+
```{r fc_date}
112
+
forecast_date <- as.Date("2021-08-01")
113
+
```
114
+
Below the fold, we construct this dataset as an `epiprocess::epi_df` from JHU data.
57
115
58
-
* Basic. Has data, calls forecaster with default arguments.
59
-
* Intermediate. Wants to examine changes to the arguments, take advantage of
60
-
built in flexibility.
61
-
* Advanced. Wants to write their own forecasters. Maybe willing to build up
62
-
from some components.
116
+
<details>
117
+
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>
118
+
This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate some of the typically ubiquitous cleaning operations needed to be able to forecast.
119
+
First we pull both jhu-csse cases and deaths from [`{epidatr}` package](https://cmu-delphi.github.io/epidatr/):
120
+
```{r case_death}
121
+
cases <- pub_covidcast(
122
+
source = "jhu-csse",
123
+
signals = "confirmed_incidence_prop",
124
+
time_type = "day",
125
+
geo_type = "state",
126
+
time_values = epirange(20200601, 20220101),
127
+
geo_values = "*") |>
128
+
select(geo_value, time_value, case_rate = value)
63
129
64
-
The Advanced user should find their task to be relatively easy. Examples of
65
-
these tasks are illustrated in the [vignettes and articles](https://cmu-delphi.github.io/epipredict).
130
+
deaths <- pub_covidcast(
131
+
source = "jhu-csse",
132
+
signals = "deaths_incidence_prop",
133
+
time_type = "day",
134
+
geo_type = "state",
135
+
time_values = epirange(20200601, 20220101),
136
+
geo_values = "*") |>
137
+
select(geo_value, time_value, death_rate = value)
138
+
cases_deaths <-
139
+
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
As with basically any dataset, there is some cleaning that we will need to do to make it actually usable; we'll use some utilities from [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
153
+
First, to eliminate some of the noise coming from daily reporting, we do 7 day averaging over a trailing window[^1]:
66
154
67
-
See also the (in progress) [Forecasting Book](https://cmu-delphi.github.io/delphi-tooling-book/).
155
+
[^1]: This makes it so that any given day of the new dataset only depends on the previous week, which means that we avoid leaking future values when making a forecast.
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
88
-
222
+
To make a forecast, we will use a "canned" simple auto-regressive forecaster to predict the death rate four weeks into the future using past (lagged) deaths and cases
In this case, we have used a number of different lags for the case rate, while
103
-
only using 3 weekly lags for the death rate (as predictors). The result is both
237
+
using zero, one and two weekly lags for the death rate (as predictors). `four_week_ahead` is both
104
238
a fitted model object which could be used any time in the future to create
105
239
different forecasts, as well as a set of predicted values (and prediction
106
-
intervals) for each location 14 days after the last available time value in the
107
-
data.
240
+
intervals) for each location 28 days after the forecast date.
241
+
Plotting the prediction intervals on our subset above[^2]:
242
+
243
+
[^2]: Alternatively, you could call `auto_plot(four_week_ahead)` to get the full collection of forecasts. This is too busy for the space we have for plotting here.
108
244
109
-
```{r print-model}
110
-
two_week_ahead$epi_workflow
245
+
<details>
246
+
<summary> Plot </summary>
247
+
This is the same kind of plot as `processed_data_plot` above, but with the past data narrowed somewhat
0 commit comments