Skip to content

Commit da839de

Browse files
committed
pushing only the dev docs
1 parent 53a3bf9 commit da839de

File tree

1 file changed

+90
-99
lines changed

1 file changed

+90
-99
lines changed

README.md

+90-99
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,34 @@
11

22
<!-- README.md is generated from README.Rmd. Please edit that file -->
33

4-
# epipredict
4+
# Epipredict
55

66
<!-- badges: start -->
77

88
[![R-CMD-check](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml)
99
<!-- badges: end -->
1010

11-
**Note:** This package is currently in development and may not work as
12-
expected. Please file bug reports as issues in this repo, and we will do
13-
our best to address them quickly.
11+
Epipredict is a framework for building transformation and forecasting
12+
pipelines for epidemiological and other panel time-series datasets. In
13+
addition to tools for building forecasting pipelines, it contains a
14+
number of “canned” forecasters meant to run with little modification as
15+
an easy way to get started forecasting.
16+
17+
It is designed to work well with
18+
[`epiprocess`](https://cmu-delphi.github.io/epiprocess/), a utility for
19+
handling various time series and geographic processing tools in an
20+
epidemiological context. Both of the packages are meant to work well
21+
with the panel data provided by
22+
[`epidatr`](https://cmu-delphi.github.io/epidatr/).
23+
24+
If you are looking for more detail beyond the package documentation, see
25+
our [forecasting
26+
book](https://cmu-delphi.github.io/delphi-tooling-book/).
1427

1528
## Installation
1629

17-
To install (unless you’re making changes to the package, use the stable
18-
version):
30+
To install (unless you’re planning on contributing to package
31+
development, we suggest using the stable version):
1932

2033
``` r
2134
# Stable version
@@ -25,52 +38,14 @@ pak::pkg_install("cmu-delphi/epipredict@main")
2538
pak::pkg_install("cmu-delphi/epipredict@dev")
2639
```
2740

28-
## Documentation
29-
30-
You can view documentation for the `main` branch at
31-
<https://cmu-delphi.github.io/epipredict>.
32-
33-
## Goals for `epipredict`
34-
35-
**We hope to provide:**
36-
37-
1. A set of basic, easy-to-use forecasters that work out of the box.
38-
You should be able to do a reasonably limited amount of
39-
customization on them. For the basic forecasters, we currently
40-
provide:
41-
- Baseline flatline forecaster
42-
- Autoregressive forecaster
43-
- Autoregressive classifier
44-
- CDC FluSight flatline forecaster
45-
2. A framework for creating custom forecasters out of modular
46-
components. There are four types of components:
47-
- Preprocessor: do things to the data before model training
48-
- Trainer: train a model on data, resulting in a fitted model object
49-
- Predictor: make predictions, using a fitted model object
50-
- Postprocessor: do things to the predictions before returning
41+
The documentation for the stable version is at
42+
<https://cmu-delphi.github.io/epipredict>, while the development version
43+
is at <https://cmu-delphi.github.io/epipredict/dev>.
5144

52-
**Target audiences:**
45+
## Motivating example
5346

54-
- Basic. Has data, calls forecaster with default arguments.
55-
- Intermediate. Wants to examine changes to the arguments, take
56-
advantage of built in flexibility.
57-
- Advanced. Wants to write their own forecasters. Maybe willing to build
58-
up from some components.
59-
60-
The Advanced user should find their task to be relatively easy. Examples
61-
of these tasks are illustrated in the [vignettes and
62-
articles](https://cmu-delphi.github.io/epipredict).
63-
64-
See also the (in progress) [Forecasting
65-
Book](https://cmu-delphi.github.io/delphi-tooling-book/).
66-
67-
## Intermediate example
68-
69-
The package comes with some built-in historical data for illustration,
70-
but up-to-date versions of this could be downloaded with the
71-
[`{epidatr}` package](https://cmu-delphi.github.io/epidatr/) and
72-
processed using
73-
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^1]
47+
To demonstrate the kind of forecast epipredict can make, say we’re
48+
predicting COVID deaths per 100k for each state on
7449

7550
``` r
7651
forecast_date <- as.Date("2021-08-01")
@@ -95,17 +70,19 @@ cases <- pub_covidcast(
9570
signals = "confirmed_incidence_prop",
9671
time_type = "day",
9772
geo_type = "state",
98-
time_values = epirange(20200601, 20220101),
99-
geo_values = "*") |>
73+
time_values = epirange(20200601, 20211231),
74+
geo_values = "*"
75+
) |>
10076
select(geo_value, time_value, case_rate = value)
10177

10278
deaths <- pub_covidcast(
10379
source = "jhu-csse",
10480
signals = "deaths_incidence_prop",
10581
time_type = "day",
10682
geo_type = "state",
107-
time_values = epirange(20200601, 20220101),
108-
geo_values = "*") |>
83+
time_values = epirange(20200601, 20211231),
84+
geo_values = "*"
85+
) |>
10986
select(geo_value, time_value, death_rate = value)
11087
cases_deaths <-
11188
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
@@ -123,6 +100,7 @@ cases_deaths |>
123100
```
124101

125102
<img src="man/figures/README-case_death-1.png" width="90%" style="display: block; margin: auto;" />
103+
126104
As with basically any dataset, there is some cleaning that we will need
127105
to do to make it actually usable; we’ll use some utilities from
128106
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
@@ -131,7 +109,7 @@ First, to eliminate some of the noise coming from daily reporting, we do
131109

132110
``` r
133111
cases_deaths <-
134-
cases_deaths |>
112+
cases_deaths |>
135113
group_by(geo_value) |>
136114
epi_slide(
137115
cases_7dav = mean(case_rate, na.rm = TRUE),
@@ -150,47 +128,54 @@ cases_deaths <-
150128
cases_deaths |>
151129
group_by(geo_value) |>
152130
mutate(
153-
outlr_death_rate = detect_outlr_rm(time_value, death_rate, detect_negatives = TRUE),
154-
outlr_case_rate = detect_outlr_rm(time_value, case_rate, detect_negatives = TRUE)
131+
outlr_death_rate = detect_outlr_rm(
132+
time_value, death_rate, detect_negatives = TRUE
133+
),
134+
outlr_case_rate = detect_outlr_rm(
135+
time_value, case_rate, detect_negatives = TRUE
136+
)
155137
) |>
156138
unnest(cols = starts_with("outlr"), names_sep = "_") |>
157139
ungroup() |>
158140
mutate(
159141
death_rate = outlr_death_rate_replacement,
160-
case_rate = outlr_case_rate_replacement) |>
142+
case_rate = outlr_case_rate_replacement
143+
) |>
161144
select(geo_value, time_value, case_rate, death_rate)
162145
cases_deaths
163-
#> An `epi_df` object, 32,480 x 4 with metadata:
146+
#> An `epi_df` object, 32,424 x 4 with metadata:
164147
#> * geo_type = state
165148
#> * time_type = day
166-
#> * as_of = 2022-05-31 12:08:25.791826
149+
#> * as_of = 2022-01-01
167150
#>
168-
#> # A tibble: 20,496 × 4
169-
#> geo_value time_value case_rate death_rate
170-
#> * <chr> <date> <dbl> <dbl>
171-
#> 1 ak 2020-12-31 35.9 0.158
172-
#> 2 al 2020-12-31 65.1 0.438
173-
#> 3 ar 2020-12-31 66.0 1.27
174-
#> 4 as 2020-12-31 0 0
175-
#> 5 az 2020-12-31 76.8 1.10
176-
#> 6 ca 2020-12-31 96.0 0.751
177-
#> 7 co 2020-12-31 35.8 0.649
178-
#> 8 ct 2020-12-31 52.1 0.819
179-
#> 9 dc 2020-12-31 31.0 0.601
180-
#> 10 de 2020-12-31 65.2 0.807
181-
#> # ℹ 20,486 more rows
151+
#> # A tibble: 32,424 × 4
152+
#> geo_value time_value case_rate death_rate
153+
#> * <chr> <date> <dbl> <dbl>
154+
#> 1 ak 2020-06-01 2.31 0
155+
#> 2 ak 2020-06-02 1.94 0
156+
#> 3 ak 2020-06-03 2.63 0
157+
#> 4 ak 2020-06-04 2.59 0
158+
#> 5 ak 2020-06-05 2.43 0
159+
#> 6 ak 2020-06-06 2.35 0
160+
#> # ℹ 32,418 more rows
182161
```
183162

184-
To create and train a simple auto-regressive forecaster to predict the
185-
death rate two weeks into the future using past (lagged) deaths and
186-
cases, we could use the following function.
163+
</details>
164+
165+
After having downloaded and cleaned the data in `cases_deaths`, we plot
166+
a subset of the states, noting the actual forecast date:
167+
168+
<details>
169+
<summary>
170+
Plot
171+
</summary>
187172

188173
``` r
189174
forecast_date_label <-
190175
tibble(
191176
geo_value = rep(plot_locations, 2),
192-
source = c(rep("case_rate",4), rep("death_rate", 4)),
193-
dates = rep(forecast_date - 7*2, 2 * length(plot_locations)),
177+
source = c(rep("case_rate", 4), rep("death_rate", 4)),
178+
dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations)),
194179
heights = c(rep(150, 4), rep(1.0, 4))
195180
)
196181
processed_data_plot <-
@@ -202,7 +187,10 @@ processed_data_plot <-
202187
facet_grid(source ~ geo_value, scale = "free") +
203188
geom_vline(aes(xintercept = forecast_date)) +
204189
geom_text(
205-
data = forecast_date_label, aes(x=dates, label = "forecast\ndate", y = heights), size = 3, hjust = "right") +
190+
data = forecast_date_label,
191+
aes(x = dates, label = "forecast\ndate", y = heights),
192+
size = 3, hjust = "right"
193+
) +
206194
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
207195
theme(axis.text.x = element_text(angle = 90, hjust = 1))
208196
```
@@ -222,25 +210,26 @@ four_week_ahead <- arx_forecaster(
222210
predictors = c("case_rate", "death_rate"),
223211
args_list = arx_args_list(
224212
lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
225-
ahead = 14
213+
ahead = 4 * 7
226214
)
227215
)
228-
two_week_ahead
229-
#> ══ A basic forecaster of type ARX Forecaster ═══════════════════════════════
216+
four_week_ahead
217+
#> ══ A basic forecaster of type ARX Forecaster ═══════════════════════════════
230218
#>
231-
#> This forecaster was fit on 2024-11-11 11:38:31.
219+
#> This forecaster was fit on 2025-01-24 14:47:38.
232220
#>
233221
#> Training data was an <epi_df> with:
234222
#> • Geography: state,
235223
#> • Time type: day,
236-
#> • Using data up-to-date as of: 2022-05-31 12:08:25.
224+
#> • Using data up-to-date as of: 2022-01-01.
225+
#> • With the last data available on 2021-08-01
237226
#>
238-
#> ── Predictions ─────────────────────────────────────────────────────────────
227+
#> ── Predictions ─────────────────────────────────────────────────────────────
239228
#>
240229
#> A total of 56 predictions are available for
241230
#> • 56 unique geographic regions,
242-
#> • At forecast date: 2021-12-31,
243-
#> • For target date: 2022-01-14.
231+
#> • At forecast date: 2021-08-01,
232+
#> • For target date: 2021-08-29,
244233
#>
245234
```
246235

@@ -271,15 +260,16 @@ narrow_data_plot <-
271260
facet_grid(source ~ geo_value, scale = "free") +
272261
geom_vline(aes(xintercept = forecast_date)) +
273262
geom_text(
274-
data = forecast_date_label, aes(x=dates, label = "forecast\ndate", y = heights), size = 3, hjust = "right") +
263+
data = forecast_date_label,
264+
aes(x = dates, label = "forecast\ndate", y = heights),
265+
size = 3, hjust = "right"
266+
) +
275267
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
276268
theme(axis.text.x = element_text(angle = 90, hjust = 1))
277269
```
278270

279-
The fitted model here involved preprocessing the data to appropriately
280-
generate lagged predictors, estimating a linear model with `stats::lm()`
281-
and then postprocessing the results to be meaningful for epidemiological
282-
tasks. We can also examine the predictions.
271+
Putting that together with a plot of the bands, and a plot of the median
272+
prediction.
283273

284274
``` r
285275
epiworkflow <- four_week_ahead$epi_workflow
@@ -293,16 +283,17 @@ forecast_plot <-
293283
epipredict:::plot_bands(
294284
restricted_predictions,
295285
levels = 0.9,
296-
fill = primary) +
297-
geom_point(data = restricted_predictions, aes(y = .data$value), color = secondary)
286+
fill = primary
287+
) +
288+
geom_point(data = restricted_predictions,
289+
aes(y = .data$value),
290+
color = secondary)
298291
```
299292

300-
The results above show a distributional forecast produced using data
301-
through the end of 2021 for the 14th of January 2022. A prediction for
302-
the death rate per 100K inhabitants is available for every state
303-
(`geo_value`) along with a 90% predictive interval.
293+
</details>
304294

305295
<img src="man/figures/README-show-single-forecast-1.png" width="90%" style="display: block; margin: auto;" />
296+
306297
The yellow dot gives the median prediction, while the red interval gives
307298
the 5-95% inter-quantile range. For this particular day and these
308299
locations, the forecasts are relatively accurate, with the true data

0 commit comments

Comments
 (0)