Skip to content

Commit 38b86e8

Browse files
committed
T,F -> TRUE,FALSE
1 parent e81a326 commit 38b86e8

File tree

2 files changed

+99
-13
lines changed

2 files changed

+99
-13
lines changed

vignettes/epipredict.Rmd

+86
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,92 @@ library(recipes)
1919
library(epipredict)
2020
```
2121

22+
## Goals for `epipredict` (from README)
23+
24+
**We hope to provide:**
25+
26+
1. A set of basic, easy-to-use forecasters that work out of the box. You should be able to do a reasonably limited amount of customization on them. For the basic forecasters, we currently provide:
27+
* Baseline flatline forecaster
28+
* Autoregressive forecaster
29+
* Autoregressive classifier
30+
* CDC FluSight flatline forecaster
31+
2. A framework for creating custom forecasters out of modular components. There are four types of components:
32+
* Preprocessor: do things to the data before model training
33+
* Trainer: train a model on data, resulting in a fitted model object
34+
* Predictor: make predictions, using a fitted model object
35+
* Postprocessor: do things to the predictions before returning
36+
37+
**Target audiences:**
38+
39+
* Basic. Has data, calls forecaster with default arguments.
40+
* Intermediate. Wants to examine changes to the arguments, take advantage of
41+
built in flexibility.
42+
* Advanced. Wants to write their own forecasters. Maybe willing to build up
43+
from some components.
44+
45+
The Advanced user should find their task to be relatively easy. Examples of
46+
these tasks are illustrated in the [vignettes and articles](https://cmu-delphi.github.io/epipredict).
47+
48+
See also the (in progress) [Forecasting Book](https://cmu-delphi.github.io/delphi-tooling-book/).
49+
50+
## Intermediate example
51+
52+
The package comes with some built-in historical data for illustration, but
53+
up-to-date versions of this could be downloaded with the
54+
[`{epidatr}` package](https://cmu-delphi.github.io/epidatr/)
55+
and processed using
56+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^1]
57+
58+
[^1]: Other epidemiological signals for non-Covid related illnesses are also
59+
available with [`{epidatr}`](https://github.com/cmu-delphi/epidatr) which
60+
interfaces directly to Delphi's
61+
[Epidata API](https://cmu-delphi.github.io/delphi-epidata/)
62+
63+
```{r init_stuff, message=FALSE}
64+
library(epipredict)
65+
covid_case_death_rates
66+
```
67+
68+
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
69+
70+
```{r make-forecasts, warning=FALSE}
71+
two_week_ahead <- arx_forecaster(
72+
covid_case_death_rates,
73+
outcome = "death_rate",
74+
predictors = c("case_rate", "death_rate"),
75+
args_list = arx_args_list(
76+
lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
77+
ahead = 14
78+
)
79+
)
80+
two_week_ahead
81+
```
82+
83+
In this case, we have used a number of different lags for the case rate, while
84+
only using 3 weekly lags for the death rate (as predictors). The result is both
85+
a fitted model object which could be used any time in the future to create
86+
different forecasts, as well as a set of predicted values (and prediction
87+
intervals) for each location 14 days after the last available time value in the
88+
data.
89+
90+
```{r print-model}
91+
two_week_ahead$epi_workflow
92+
```
93+
94+
The fitted model here involved preprocessing the data to appropriately generate
95+
lagged predictors, estimating a linear model with `stats::lm()` and then
96+
postprocessing the results to be meaningful for epidemiological tasks. We can
97+
also examine the predictions.
98+
99+
```{r show-preds}
100+
two_week_ahead$predictions
101+
```
102+
103+
The results above show a distributional forecast produced using data through
104+
the end of 2021 for the 14th of January 2022. A prediction for the death rate
105+
per 100K inhabitants is available for every state (`geo_value`) along with a
106+
90% predictive interval.
107+
22108

23109
# Goals for the package
24110

vignettes/panel-data.Rmd

+13-13
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ An example of this is the [`covid_case_death_rates`](
3232
dataset, which contains daily state-wise measures of `case_rate` and
3333
`death_rate` for COVID-19 in 2021:
3434

35-
```{r epi-panel-ex, include=T}
35+
```{r epi-panel-ex, include=TRUE}
3636
head(covid_case_death_rates, 3)
3737
```
3838

@@ -103,7 +103,7 @@ years
103103
* `edu_qual` (key): one of 32 unique educational qualifications, e.g.,
104104
"Master's diploma"
105105

106-
```{r preview-data, include=T}
106+
```{r preview-data, include=TRUE}
107107
# Rename for simplicity
108108
employ <- grad_employ_subset
109109
sample_n(employ, 6)
@@ -122,7 +122,7 @@ first pre-process by standardizing each numeric column by the total within
122122
each group of keys. We do this since those raw numeric values will vary greatly
123123
from province to province since there are large differences in population.
124124

125-
```{r employ-small, include=T}
125+
```{r employ-small, include=TRUE}
126126
employ_small <- employ %>%
127127
group_by(geo_value, age_group, edu_qual) %>%
128128
# Select groups where there are complete time series values
@@ -141,10 +141,10 @@ Note that some groups
141141
do not have any time series information since we filtered out all time series
142142
with incomplete dates.
143143

144-
```{r employ-small-graph, include=T, eval=T, fig.width=9, fig.height=6}
144+
```{r employ-small-graph, include=TRUE, eval=TRUE, fig.width=9, fig.height=6}
145145
employ_small %>%
146146
filter(geo_value %in% c("British Columbia", "Ontario")) %>%
147-
filter(grepl("degree", edu_qual, fixed = T)) %>%
147+
filter(grepl("degree", edu_qual, fixed = TRUE)) %>%
148148
group_by(geo_value, time_value, edu_qual, age_group) %>%
149149
summarise(num_graduates_prop = sum(num_graduates_prop), .groups = "drop") %>%
150150
ggplot(aes(x = time_value, y = num_graduates_prop, color = geo_value)) +
@@ -183,7 +183,7 @@ Also note that
183183
since we specified our `time_type` to be `year`, our `lag` and `lead`
184184
values are both in years.
185185

186-
```{r make-recipe, include=T, eval=T}
186+
```{r make-recipe, include=TRUE, eval=TRUE}
187187
r <- epi_recipe(employ_small) %>%
188188
step_epi_ahead(num_graduates_prop, ahead = 1) %>%
189189
step_epi_lag(num_graduates_prop, lag = 0:2) %>%
@@ -194,7 +194,7 @@ r
194194
Let's apply this recipe using `prep` and `bake` to generate and view the `lag`
195195
and `ahead` columns.
196196

197-
```{r view-preprocessed, include=T}
197+
```{r view-preprocessed, include=TRUE}
198198
# Display a sample of the pre-processed data
199199
bake_and_show_sample <- function(recipe, data, n = 5) {
200200
recipe %>%
@@ -227,7 +227,7 @@ that `epi_workflow` is a container and doesn't actually do the fitting. We have
227227
to pass the workflow into `fit()` to get our estimated model coefficients
228228
$\widehat{\alpha}_i,\ i=0,...,3$.
229229

230-
```{r linearreg-wf, include=T}
230+
```{r linearreg-wf, include=TRUE}
231231
wf_linreg <- epi_workflow(r, linear_reg()) %>%
232232
fit(employ_small)
233233
summary(hardhat::extract_fit_engine(wf_linreg))
@@ -321,7 +321,7 @@ $z_{tijk}$ is the number of graduates (proportion) at time $t$.
321321

322322
Again, we construct an `epi_recipe` detailing the pre-processing steps.
323323

324-
```{r custom-arx, include=T}
324+
```{r custom-arx, include=TRUE}
325325
rx <- epi_recipe(employ_small) %>%
326326
step_epi_ahead(med_income_5y_prop, ahead = 1) %>%
327327
# 5-year median income has current, and two lags c(0, 1, 2)
@@ -351,7 +351,7 @@ rather than standardized proportions. We do this via the frosting layer
351351
`layer_population_scaling()`.
352352

353353

354-
```{r custom-arx-post, include=T}
354+
```{r custom-arx-post, include=TRUE}
355355
# Create dataframe of the sums we used for standardizing
356356
# Only have to include med_income_5y since that is our outcome
357357
totals <- employ_small %>%
@@ -427,7 +427,7 @@ time point. This model is representated algebraically as:
427427
\[y_{t+1,ijk} = y_{tijk} + \epsilon_{tijk}\]
428428
where $y_{tijk}$ is the 2-year median income (proportion) at time $t$.
429429

430-
```{r flatline, include=T, warning=F}
430+
```{r flatline, include=TRUE, warning=FALSE}
431431
out_fl <- flatline_forecaster(employ_small, "med_income_2y_prop",
432432
args_list = flatline_args_list(ahead = 1)
433433
)
@@ -447,7 +447,7 @@ is very similar to the model we introduced in the "Autoregressive Linear Model
447447
with Exogenous Inputs" section of this article, but where all inputs have the
448448
same number of lags.
449449

450-
```{r arx-lr, include=T, warning=F}
450+
```{r arx-lr, include=TRUE, warning=FALSE}
451451
arx_args <- arx_args_list(lags = c(0L, 1L), ahead = 1L)
452452
453453
out_arx_lr <- arx_forecaster(employ_small, "med_income_5y_prop",
@@ -461,7 +461,7 @@ out_arx_lr
461461
Other changes to the direct AR forecaster, like changing the engine, also work
462462
as expected. Below we use a boosted tree model instead of a linear regression.
463463

464-
```{r arx-rf, include=T, warning=F}
464+
```{r arx-rf, include=TRUE, warning=FALSE}
465465
out_arx_rf <- arx_forecaster(
466466
employ_small, "med_income_5y_prop",
467467
c("med_income_5y_prop", "med_income_2y_prop", "num_graduates_prop"),

0 commit comments

Comments
 (0)