T,F -> TRUE,FALSE

dsweber2 · dsweber2 · commit 38b86e8fe359 · 2025-01-23T13:06:34.000-06:00
diff --git a/vignettes/epipredict.Rmd b/vignettes/epipredict.Rmd
@@ -19,6 +19,92 @@ library(recipes)
 library(epipredict)
 ```
 
+## Goals for `epipredict` (from README)
+
+**We hope to provide:**
+
+1. A set of basic, easy-to-use forecasters that work out of the box. You should be able to do a reasonably limited amount of customization on them. For the basic forecasters, we currently provide:
+    * Baseline flatline forecaster
+    * Autoregressive forecaster
+    * Autoregressive classifier
+    * CDC FluSight flatline forecaster
+2. A framework for creating custom forecasters out of modular components. There are four types of components:
+    * Preprocessor: do things to the data before model training
+    * Trainer: train a model on data, resulting in a fitted model object
+    * Predictor: make predictions, using a fitted model object
+    * Postprocessor: do things to the predictions before returning
+
+**Target audiences:**
+
+* Basic. Has data, calls forecaster with default arguments.
+* Intermediate. Wants to examine changes to the arguments, take advantage of
+built in flexibility.
+* Advanced. Wants to write their own forecasters. Maybe willing to build up
+from some components.
+
+The Advanced user should find their task to be relatively easy. Examples of
+these tasks are illustrated in the [vignettes and articles](https://cmu-delphi.github.io/epipredict).
+
+See also the (in progress) [Forecasting Book](https://cmu-delphi.github.io/delphi-tooling-book/).
+
+## Intermediate example
+
+The package comes with some built-in historical data for illustration, but
+up-to-date versions of this could be downloaded with the
+[`{epidatr}` package](https://cmu-delphi.github.io/epidatr/)
+and processed using
+[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^1]
+
+[^1]: Other epidemiological signals for non-Covid related illnesses are also
+available with [`{epidatr}`](https://github.com/cmu-delphi/epidatr) which
+interfaces directly to Delphi's
+[Epidata API](https://cmu-delphi.github.io/delphi-epidata/)
+
+```{r init_stuff, message=FALSE}
+library(epipredict)
+covid_case_death_rates
+```
+
+To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
+
+```{r make-forecasts, warning=FALSE}
+two_week_ahead <- arx_forecaster(
+  covid_case_death_rates,
+  outcome = "death_rate",
+  predictors = c("case_rate", "death_rate"),
+  args_list = arx_args_list(
+    lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
+    ahead = 14
+  )
+)
+two_week_ahead
+```
+
+In this case, we have used a number of different lags for the case rate, while
+only using 3 weekly lags for the death rate (as predictors). The result is both
+a fitted model object which could be used any time in the future to create
+different forecasts, as well as a set of predicted values (and prediction
+intervals) for each location 14 days after the last available time value in the
+data.
+
+```{r print-model}
+two_week_ahead$epi_workflow
+```
+
+The fitted model here involved preprocessing the data to appropriately generate
+lagged predictors, estimating a linear model with `stats::lm()` and then
+postprocessing the results to be meaningful for epidemiological tasks. We can
+also examine the predictions.
+
+```{r show-preds}
+two_week_ahead$predictions
+```
+
+The results above show a distributional forecast produced using data through
+the end of 2021 for the 14th of January 2022. A prediction for the death rate
+per 100K inhabitants is available for every state (`geo_value`) along with a
+90% predictive interval.
+
 
 # Goals for the package
 
diff --git a/vignettes/panel-data.Rmd b/vignettes/panel-data.Rmd
@@ -32,7 +32,7 @@ An example of this is the [`covid_case_death_rates`](
 dataset, which contains daily state-wise measures of `case_rate` and 
 `death_rate` for COVID-19 in 2021:
 
-```{r epi-panel-ex, include=T}
+```{r epi-panel-ex, include=TRUE}
 head(covid_case_death_rates, 3)
 ```
 
@@ -103,7 +103,7 @@ years
 * `edu_qual` (key): one of 32 unique educational qualifications, e.g.,
 "Master's diploma"
 
-```{r preview-data, include=T}
+```{r preview-data, include=TRUE}
 # Rename for simplicity
 employ <- grad_employ_subset
 sample_n(employ, 6)
@@ -122,7 +122,7 @@ first pre-process by standardizing each numeric column by the total within
 each group of keys. We do this since those raw numeric values will vary greatly
 from province to province since there are large differences in population.
 
-```{r employ-small, include=T}
+```{r employ-small, include=TRUE}
 employ_small <- employ %>%
   group_by(geo_value, age_group, edu_qual) %>%
   # Select groups where there are complete time series values
@@ -141,10 +141,10 @@ Note that some groups
 do not have any time series information since we filtered out all time series
 with incomplete dates.
 
-```{r employ-small-graph, include=T, eval=T, fig.width=9, fig.height=6}
+```{r employ-small-graph, include=TRUE, eval=TRUE, fig.width=9, fig.height=6}
 employ_small %>%
   filter(geo_value %in% c("British Columbia", "Ontario")) %>%
-  filter(grepl("degree", edu_qual, fixed = T)) %>%
+  filter(grepl("degree", edu_qual, fixed = TRUE)) %>%
   group_by(geo_value, time_value, edu_qual, age_group) %>%
   summarise(num_graduates_prop = sum(num_graduates_prop), .groups = "drop") %>%
   ggplot(aes(x = time_value, y = num_graduates_prop, color = geo_value)) +
@@ -183,7 +183,7 @@ Also note that
 since we specified our `time_type` to be `year`, our `lag` and `lead`
 values are both in years.
 
-```{r make-recipe, include=T, eval=T}
+```{r make-recipe, include=TRUE, eval=TRUE}
 r <- epi_recipe(employ_small) %>%
   step_epi_ahead(num_graduates_prop, ahead = 1) %>%
   step_epi_lag(num_graduates_prop, lag = 0:2) %>%
@@ -194,7 +194,7 @@ r
 Let's apply this recipe using `prep` and `bake` to generate and view the `lag`
 and `ahead` columns.
 
-```{r view-preprocessed, include=T}
+```{r view-preprocessed, include=TRUE}
 # Display a sample of the pre-processed data
 bake_and_show_sample <- function(recipe, data, n = 5) {
   recipe %>%
@@ -227,7 +227,7 @@ that `epi_workflow` is a container and doesn't actually do the fitting. We have
 to pass the workflow into `fit()` to get our estimated model coefficients
 $\widehat{\alpha}_i,\ i=0,...,3$.
 
-```{r linearreg-wf, include=T}
+```{r linearreg-wf, include=TRUE}
 wf_linreg <- epi_workflow(r, linear_reg()) %>%
   fit(employ_small)
 summary(hardhat::extract_fit_engine(wf_linreg))
@@ -321,7 +321,7 @@ $z_{tijk}$ is the number of graduates (proportion) at time $t$.
 
 Again, we construct an `epi_recipe` detailing the pre-processing steps.
 
-```{r custom-arx, include=T}
+```{r custom-arx, include=TRUE}
 rx <- epi_recipe(employ_small) %>%
   step_epi_ahead(med_income_5y_prop, ahead = 1) %>%
   # 5-year median income has current, and two lags c(0, 1, 2)
@@ -351,7 +351,7 @@ rather than standardized proportions. We do this via the frosting layer
 `layer_population_scaling()`.
 
 
-```{r custom-arx-post, include=T}
+```{r custom-arx-post, include=TRUE}
 # Create dataframe of the sums we used for standardizing
 # Only have to include med_income_5y since that is our outcome
 totals <- employ_small %>%
@@ -427,7 +427,7 @@ time point. This model is representated algebraically as:
 \[y_{t+1,ijk} = y_{tijk} + \epsilon_{tijk}\]
 where $y_{tijk}$ is the 2-year median income (proportion) at time $t$.
 
-```{r flatline, include=T, warning=F}
+```{r flatline, include=TRUE, warning=FALSE}
 out_fl <- flatline_forecaster(employ_small, "med_income_2y_prop",
   args_list = flatline_args_list(ahead = 1)
 )
@@ -447,7 +447,7 @@ is very similar to the model we introduced in the "Autoregressive Linear Model
 with Exogenous Inputs" section of this article, but where all inputs have the
 same number of lags.
 
-```{r arx-lr, include=T, warning=F}
+```{r arx-lr, include=TRUE, warning=FALSE}
 arx_args <- arx_args_list(lags = c(0L, 1L), ahead = 1L)
 
 out_arx_lr <- arx_forecaster(employ_small, "med_income_5y_prop",
@@ -461,7 +461,7 @@ out_arx_lr
 Other changes to the direct AR forecaster, like changing the engine, also work
 as expected. Below we use a boosted tree model instead of a linear regression.
 
-```{r arx-rf, include=T, warning=F}
+```{r arx-rf, include=TRUE, warning=FALSE}
 out_arx_rf <- arx_forecaster(
   employ_small, "med_income_5y_prop",
   c("med_income_5y_prop", "med_income_2y_prop", "num_graduates_prop"),