Skip to content

Commit ce1adfe

Browse files
nmdefriesdsweber2
authored andcommitted
landing page again but in Rmd
1 parent 524066a commit ce1adfe

File tree

4 files changed

+65
-47
lines changed

4 files changed

+65
-47
lines changed

README.Rmd

+61-43
Original file line numberDiff line numberDiff line change
@@ -77,26 +77,24 @@ scale_colour_delphi <- scale_color_delphi
7777
[![R-CMD-check](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml)
7878
<!-- badges: end -->
7979

80-
Epipredict is a framework for building transformation and forecasting pipelines
81-
for epidemiological and other panel time-series datasets.
82-
In addition to tools for building forecasting pipelines, it contains a number of
83-
"canned" forecasters meant to run with little modification as an easy way to get
84-
started forecasting.
80+
`{epipredict}` is a framework for building transformation and forecasting pipelines for epidemiological and other panel time-series datasets.
81+
In addition to tools for building forecasting pipelines, it contains a number of “canned” forecasters meant to run with little modification as an easy way to get started forecasting.
8582

8683
It is designed to work well with
87-
[`epiprocess`](https://cmu-delphi.github.io/epiprocess/), a utility for handling
88-
various time series and geographic processing tools in an epidemiological
89-
context.
84+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/), a utility for time series handling and geographic processing in an epidemiological context.
9085
Both of the packages are meant to work well with the panel data provided by
91-
[`epidatr`](https://cmu-delphi.github.io/epidatr/).
86+
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/).
87+
Pre-compiled example datasets are also available in
88+
[`{epidatasets}`](https://cmu-delphi.github.io/epidatasets/).
9289

93-
If you are looking for more detail beyond the package documentation, see our
90+
If you are looking for detail beyond the package documentation, see our
9491
[forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/).
9592

93+
9694
## Installation
9795

98-
To install (unless you're planning on contributing to package development, we
99-
suggest using the stable version):
96+
Unless youre planning on contributing to package development, we suggest using the stable version.
97+
To install, run:
10098

10199
```r
102100
# Stable version
@@ -113,23 +111,43 @@ The documentation for the stable version is at
113111

114112
## Motivating example
115113

116-
To demonstrate the kind of forecast epipredict can make, say we're predicting
117-
COVID deaths per 100k for each state on
114+
To demonstrate using `{epipredict}` for forecasting, say we want to
115+
predict COVID-19 deaths per 100k people for each of a subset of states
116+
117+
```{r subset_geos}
118+
used_locations <- c("ca", "ma", "ny", "tx")
119+
```
120+
121+
on
118122

119123
```{r fc_date}
120124
forecast_date <- as.Date("2021-08-01")
121125
```
122126

123-
Below the fold, we construct this dataset as an `epiprocess::epi_df` from JHU
124-
data.
127+
<details>
128+
<summary> Required packages </summary>
129+
130+
```{r install, run = FALSE}
131+
library(epipredict)
132+
library(epidatr)
133+
library(epiprocess)
134+
library(dplyr)
135+
library(ggplot2)
136+
```
137+
</details>
138+
139+
140+
Below the fold, we construct this dataset as an `epiprocess::epi_df` from
141+
[Johns Hopkins Center for Systems Science and Engineering deaths data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html).
125142

126143
<details>
127144
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>
128145

129-
This dataset can be found in the package as `covid_case_death_rates`; we
130-
demonstrate some of the typically ubiquitous cleaning operations needed to be
131-
able to forecast.
132-
First we pull both jhu-csse cases and deaths from
146+
This section is intended to demonstrate some of the ubiquitous cleaning operations needed to be able to forecast.
147+
The dataset prepared here is also included ready-to-go in `{epipredict}` as `covid_case_death_rates`.
148+
149+
First we pull both `jhu-csse` cases and deaths data from the
150+
[Delphi API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html) using the
133151
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
134152

135153
```{r case_death, warning = FALSE}
@@ -155,9 +173,9 @@ deaths <- pub_covidcast(
155173
```
156174

157175
Since visualizing the results on every geography is somewhat overwhelming,
158-
we'll only train on a subset of 5.
176+
we’ll only train on a subset of locations.
177+
159178
```{r date, warning = FALSE}
160-
used_locations <- c("ca", "ma", "ny", "tx")
161179
cases_deaths <-
162180
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
163181
filter(geo_value %in% used_locations) |>
@@ -178,12 +196,12 @@ cases_deaths |>
178196
theme(axis.text.x = element_text(angle = 90, hjust = 1))
179197
```
180198

181-
As with basically any dataset, there is some cleaning that we will need to do to
182-
make it actually usable; we'll use some utilities from
199+
As with the typical dataset, we will need to do some cleaning to
200+
make it actually usable; well use some utilities from
183201
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
184202

185-
First, to eliminate some of the noise coming from daily reporting, we do 7 day
186-
averaging over a trailing window[^1]:
203+
First, to reduce noise from daily reporting, we will compute a 7 day
204+
average over a trailing window[^1]:
187205

188206
[^1]: This makes it so that any given day of the processed time-series only
189207
depends on the previous week, which means that we avoid leaking future
@@ -203,7 +221,7 @@ cases_deaths <-
203221
rename(case_rate = cases_7dav, death_rate = death_rate_7dav)
204222
```
205223

206-
Then trimming outliers, most especially negative values:
224+
Then we'll trim outliers, especially negative values:
207225

208226
```{r outlier}
209227
cases_deaths <-
@@ -229,11 +247,12 @@ cases_deaths <-
229247
```
230248
</details>
231249

232-
After having downloaded and cleaned the data in `cases_deaths`, we plot a subset
233-
of the states, noting the actual forecast date:
250+
After downloading and cleaning the cases and deaths data, we can plot
251+
a subset of the states, marking the desired forecast date:
234252

235253
<details>
236254
<summary> Plot </summary>
255+
237256
```{r plot_locs}
238257
forecast_date_label <-
239258
tibble(
@@ -269,15 +288,15 @@ processed_data_plot <-
269288
processed_data_plot
270289
```
271290

272-
To make a forecast, we will use a "canned" simple auto-regressive forecaster to
291+
To make a forecast, we will use a simple “canned” auto-regressive forecaster to
273292
predict the death rate four weeks into the future using lagged[^3] deaths and
274-
cases
293+
cases.
275294

276295
[^3]: lagged by 3 in this context meaning using the value from 3 days ago.
277296

278297
```{r make-forecasts, warning=FALSE}
279298
four_week_ahead <- arx_forecaster(
280-
cases_deaths |> filter(time_value <= forecast_date),
299+
covid_case_death_rates |> filter(time_value <= forecast_date),
281300
outcome = "death_rate",
282301
predictors = c("case_rate", "death_rate"),
283302
args_list = arx_args_list(
@@ -289,14 +308,14 @@ four_week_ahead <- arx_forecaster(
289308
four_week_ahead
290309
```
291310

292-
In this case, we have used 0-3 days, a week, and two week lags for the case
293-
rate, while using only zero, one and two weekly lags for the death rate (as
294-
predictors).
311+
In our model setup, we are defining as our predictors case rate lagged 0-3
312+
days, one week, and two weeks, and death rate lagged 0-2 weeks.
295313
The result `four_week_ahead` is both a fitted model object which could be used
296-
any time in the future to create different forecasts, as well as a set of
297-
predicted values (and prediction intervals) for each location 28 days after the
298-
forecast date.
299-
Plotting the prediction intervals on our subset above[^2]:
314+
any time in the future to create different forecasts, and a set of predicted
315+
values (and prediction intervals) for each location 28 days after the forecast
316+
date.
317+
318+
Plotting the prediction intervals on the true values for our location subset[^2]:
300319

301320
[^2]: Alternatively, you could call `autoplot(four_week_ahead)` to get the full
302321
collection of forecasts. This is too busy for the space we have for plotting
@@ -350,11 +369,10 @@ A couple of things to note:
350369

351370
## Getting Help
352371
If you encounter a bug or have a feature request, feel free to file an [issue on
353-
our github page](https://github.com/cmu-delphi/epipredict/issues).
372+
our GitHub page](https://github.com/cmu-delphi/epipredict/issues).
354373
For other questions, feel free to reach out to the authors, either via this
355-
[contact
356-
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
357-
email, or the Insightnet slack.
374+
[contact form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
375+
email, or the InsightNet Slack.
358376

359377
[^4]: Note that these are not the same quantiles that we fit when creating
360378
`four_week_ahead`. They are extrapolated from those quantiles using `extrapolate_quantiles()` (which assumes an exponential decay in the tails).

man/figures/README-date-1.png

-32.9 KB
Loading
-56.1 KB
Loading

vignettes/epipredict.Rmd

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
title: "Get started with epipredict"
2+
title: "Get started with `epipredict`"
33
output: rmarkdown::html_vignette
44
vignette: >
5-
%\VignetteIndexEntry{Get started with epipredict}
5+
%\VignetteIndexEntry{Get started with `epipredict`}
66
%\VignetteEngine{knitr::rmarkdown}
77
%\VignetteEncoding{UTF-8}
88
---
@@ -54,13 +54,13 @@ Towards that end, epipredict provides two main classes of tools:
5454
Examples include linear regression, quantile regression, or [any parsnip
5555
engine](https://parsnip.tidymodels.org/).
5656
* Postprocessor: unique to this package, and used to do things to the
57-
predictions after the model has been fit, such as
57+
predictions after the model has been fit, such as
5858
- generate quantiles from purely point-prediction models,
5959
- undo operations done in the steps, such as convert back to counts from
6060
rates
6161
- generally adapt the format of the prediction to it's eventual use.
6262

63-
The rest of the getting started will focus on using and modifying the canned forecasters.
63+
The rest of the getting started will focus on using and modifying the canned forecasters.
6464
If you need a more complicated model, check out the [Guts
6565
vignette](preprocessing-and-models) for examples of using the forecaster
6666
framework.

0 commit comments

Comments
 (0)