Skip to content

Commit 66d5652

Browse files
committed
getting started page
1 parent 4699dbf commit 66d5652

File tree

3 files changed

+178
-5
lines changed

3 files changed

+178
-5
lines changed

inst/pkgdown-watch.R

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Run with: Rscript pkgdown-watch.R
2+
#
3+
# Modifying this: https://gist.github.com/gadenbuie/d22e149e65591b91419e41ea5b2e0621
4+
# - Removed docopts cli interface and various configs/features I didn't need.
5+
# - Sped up reference building by not running examples.
6+
#
7+
# Note that the `pattern` regex is case sensitive, so make sure your Rmd files
8+
# end in `.Rmd` and not `.rmd`.
9+
#
10+
# Also I had issues with `pkgdown::build_reference()` not working, so I just run
11+
# it manually when I need to.
12+
13+
rlang::check_installed(c("pkgdown", "servr", "devtools", "here", "cli", "fs"))
14+
library(pkgdown)
15+
pkg <- pkgdown::as_pkgdown(here::here())
16+
pkgdown::build_articles(pkg)
17+
pkgdown::build_site(pkg, lazy = FALSE, examples = FALSE, devel = TRUE, preview = FALSE)
18+
19+
servr::httw(
20+
dir = here::here("docs"),
21+
watch = here::here(),
22+
pattern = "[.](Rm?d|y?ml|s[ac]ss|css|js)$",
23+
handler = function(files) {
24+
devtools::load_all()
25+
26+
files_rel <- fs::path_rel(files, start = getwd())
27+
cli::cli_inform("{cli::col_yellow('Updated')} {.val {files_rel}}")
28+
29+
articles <- grep("vignettes.+Rmd$", files, value = TRUE)
30+
31+
if (length(articles) == 1) {
32+
name <- fs::path_ext_remove(fs::path_rel(articles, fs::path(pkg$src_path, "vignettes")))
33+
pkgdown::build_article(name, pkg)
34+
} else if (length(articles) > 1) {
35+
pkgdown::build_articles(pkg, preview = FALSE)
36+
}
37+
38+
refs <- grep("man.+R(m?d)?$", files, value = TRUE)
39+
if (length(refs)) {
40+
# Doesn't work for me, so I run it manually.
41+
# pkgdown::build_reference(pkg, preview = FALSE, examples = FALSE, lazy = FALSE) # nolint: commented_code_linter
42+
}
43+
44+
pkgdown <- grep("pkgdown", files, value = TRUE)
45+
if (length(pkgdown) && !pkgdown %in% c(articles, refs)) {
46+
pkgdown::init_site(pkg)
47+
}
48+
49+
pkgdown_index <- grep("index[.]Rmd$", files_rel, value = TRUE)
50+
if (length(pkgdown_index)) {
51+
devtools::build_rmd(pkgdown_index)
52+
pkgdown::build_home(pkg)
53+
}
54+
55+
readme <- grep("README[.]rmd$", files, value = TRUE, ignore.case = TRUE)
56+
if (length(readme)) {
57+
devtools::build_readme()
58+
pkgdown::build_home()
59+
pkgdown::build_site(pkg, lazy = TRUE, examples = FALSE, devel = TRUE, preview = FALSE)
60+
61+
devtools::build_readme(".")
62+
pkgdown::build_home(pkg)
63+
}
64+
65+
cli::cli_alert("Site rebuild done!")
66+
}
67+
)

vignettes/epipredict.Rmd

+109-3
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,120 @@ vignette: >
1111
source("_common.R")
1212
```
1313

14-
```{r setup, message=FALSE}
14+
```{r setup, message=FALSE, include = FALSE}
1515
library(dplyr)
1616
library(parsnip)
1717
library(workflows)
1818
library(recipes)
1919
library(epipredict)
2020
```
2121

22+
At a high level, our goal with `{epipredict}` is to make running simple machine
23+
learning / statistical forecasters for epidemiology easy.
24+
To do this, we have extended several [tidymodels](https://www.tidymodels.org/)
25+
packages to handle the case of panel time-series data.
26+
Because of this, the package is extremely extensible.
27+
Our hope is that it is easy for users with epi training and some statistics to
28+
fit baseline models while still allowing those with more nuanced statistical
29+
understanding to create complicated specializations using the same framework.
30+
Towards that end, epipredict provides two main classes of tools:
31+
32+
1. A set of basic, easy-to-use "canned" forecasters that work out of the box.
33+
You should be able to do a reasonably limited amount of customization on them.
34+
For the basic forecasters, we currently provide:
35+
* Baseline flat forecaster: predicts a median that is the previous value
36+
with increasingly wide quantiles
37+
* Autoregressive forecaster: fits a model (typically linear regression) on
38+
lagged data to predict quantiles for continuous values.
39+
* Autoregressive classifier: fits a model (typically logistic regression) on
40+
lagged data to predict probabilities for discrete values.
41+
* CDC FluSight flatline forecaster: a variant of the flatline forecaster as
42+
used by the CDC in FluSight.
43+
2. A framework for creating custom forecasters out of modular components.
44+
There are three types of components:
45+
* Preprocessor: do things to the data before model training, such as convert
46+
counts to rates, create smoothed columns, or [any of the recipes
47+
steps](https://recipes.tidymodels.org/reference/index.html)
48+
* Trainer: train a model on data, resulting in a fitted model object.
49+
Examples include linear regression, quantile regression, or [any parsnip
50+
engine](https://parsnip.tidymodels.org/).
51+
* Postprocessor: do things to the predictions after the model has been fit,
52+
such as generate quantiles from purely point-prediction models, undo
53+
operations done in the steps, such as convert back to counts from rates,
54+
and generally adapt the format of the prediction to it's eventual use.
55+
56+
The rest of this will focus on using and modifying the canned forecasters.
57+
If you need a more complicated model, check out the [Guts
58+
vignette](preprocessing-and-models) for examples of using the forecaster
59+
framework.
60+
61+
62+
The Advanced user should find their task to be relatively easy. Examples of
63+
these tasks are illustrated in the [vignettes and articles](https://cmu-delphi.github.io/epipredict).
64+
65+
See also the (in progress) [Forecasting Book](https://cmu-delphi.github.io/delphi-tooling-book/).
66+
67+
# Panel forecasting basics
68+
## Example data
69+
70+
## Intermediate example
71+
72+
The package comes with some built-in historical data for illustration, but
73+
up-to-date versions of this could be downloaded with the
74+
[`{epidatr}` package](https://cmu-delphi.github.io/epidatr/)
75+
and processed using
76+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^1]
77+
78+
[^1]: Other epidemiological signals for non-Covid related illnesses are also
79+
available with [`{epidatr}`](https://github.com/cmu-delphi/epidatr) which
80+
interfaces directly to Delphi's
81+
[Epidata API](https://cmu-delphi.github.io/delphi-epidata/)
82+
83+
```{r init_stuff, message=FALSE}
84+
library(epipredict)
85+
covid_case_death_rates
86+
```
87+
88+
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
89+
90+
```{r make-forecasts, warning=FALSE}
91+
two_week_ahead <- arx_forecaster(
92+
covid_case_death_rates,
93+
outcome = "death_rate",
94+
predictors = c("case_rate", "death_rate"),
95+
args_list = arx_args_list(
96+
lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
97+
ahead = 14
98+
)
99+
)
100+
two_week_ahead
101+
```
102+
103+
In this case, we have used a number of different lags for the case rate, while
104+
only using 3 weekly lags for the death rate (as predictors). The result is both
105+
a fitted model object which could be used any time in the future to create
106+
different forecasts, as well as a set of predicted values (and prediction
107+
intervals) for each location 14 days after the last available time value in the
108+
data.
109+
110+
```{r print-model}
111+
two_week_ahead$epi_workflow
112+
```
113+
114+
The fitted model here involved preprocessing the data to appropriately generate
115+
lagged predictors, estimating a linear model with `stats::lm()` and then
116+
postprocessing the results to be meaningful for epidemiological tasks. We can
117+
also examine the predictions.
118+
119+
```{r show-preds}
120+
two_week_ahead$predictions
121+
```
122+
123+
The results above show a distributional forecast produced using data through
124+
the end of 2021 for the 14th of January 2022. A prediction for the death rate
125+
per 100K inhabitants is available for every state (`geo_value`) along with a
126+
90% predictive interval.
127+
22128

23129
# Goals for the package
24130

@@ -195,9 +301,9 @@ Here, we've used different lags on the `case_rate` and are now predicting 2
195301
weeks ahead. This example also illustrates a major difficulty with the
196302
"iterative" versions of AR models. This model doesn't produce forecasts for
197303
`case_rate`, and so, would not have data to "plug in" for the necessary
198-
lags.[^1]
304+
lags.[^3]
199305

200-
[^1]: An obvious fix is to instead use a VAR and predict both, but this would
306+
[^3]: An obvious fix is to instead use a VAR and predict both, but this would
201307
likely increase the variance of the model, and therefore, may lead to less
202308
accurate forecasts for the variable of interest.
203309

vignettes/panel-data.Rmd

+2-2
Original file line numberDiff line numberDiff line change
@@ -234,9 +234,9 @@ summary(extract_fit_engine(wf_linreg))
234234

235235
This output tells us the coefficients of the fitted model; for instance,
236236
the estimated intercept is $\widehat{\alpha}_0 =$
237-
`r round(coef(extract_fit_engine(wf_linreg))[1], 3)` and the coefficient for
237+
`r round(coef(hardhat::extract_fit_engine(wf_linreg))[1], 3)` and the coefficient for
238238
$y_{tijk}$ is
239-
$\widehat\alpha_1 =$ `r round(coef(extract_fit_engine(wf_linreg))[2], 3)`.
239+
$\widehat\alpha_1 =$ `r round(coef(hardhat::extract_fit_engine(wf_linreg))[2], 3)`.
240240
The summary also tells us that all estimated coefficients are significantly
241241
different from zero. Extracting the 95% confidence intervals for the
242242
coefficients also leads us to

0 commit comments

Comments
 (0)