Skip to content

Commit 9c11756

Browse files
committed
getting started first draft
1 parent d35363e commit 9c11756

12 files changed

+657
-483
lines changed

DESCRIPTION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: epipredict
22
Title: Basic epidemiology forecasting methods
3-
Version: 0.1.6
3+
Version: 0.1.7
44
Authors@R: c(
55
person("Daniel J.", "McDonald", , "[email protected]", role = c("aut", "cre")),
66
person("Ryan", "Tibshirani", , "[email protected]", role = "aut"),

R/flatline_forecaster.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
#'
1717
#' @param epi_data An [epiprocess::epi_df][epiprocess::as_epi_df]
1818
#' @param outcome A scalar character for the column name we wish to predict.
19-
#' @param args_list A list of dditional arguments as created by the
19+
#' @param args_list A list of additional arguments as created by the
2020
#' [flatline_args_list()] constructor function.
2121
#'
2222
#' @return A data frame of point (and optionally interval) forecasts at a single

README.Rmd

+30-10
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,11 @@ cases_deaths |>
169169
death_rate,
170170
.color_by = "none"
171171
) +
172-
facet_grid(.response_name ~ geo_value, scale = "free") +
172+
facet_grid(
173+
rows = vars(.response_name),
174+
cols = vars(geo_value),
175+
scale = "free"
176+
) +
173177
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
174178
theme(axis.text.x = element_text(angle = 90, hjust = 1))
175179
```
@@ -181,7 +185,7 @@ make it actually usable; we'll use some utilities from
181185
First, to eliminate some of the noise coming from daily reporting, we do 7 day
182186
averaging over a trailing window[^1]:
183187

184-
[^1]: This makes it so that any given day of the processed timeseries only
188+
[^1]: This makes it so that any given day of the processed time-series only
185189
depends on the previous week, which means that we avoid leaking future
186190
values when making a forecast.
187191

@@ -236,16 +240,21 @@ forecast_date_label <-
236240
geo_value = rep(used_locations, 2),
237241
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
238242
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
239-
heights = c(rep(150, 4), rep(1.0, 4))
243+
heights = c(rep(150, 4), rep(0.75, 4))
240244
)
241245
processed_data_plot <-
242-
cases_deaths |>
246+
covid_case_death_rates |>
247+
filter(geo_value %in% used_locations) |>
243248
autoplot(
244249
case_rate,
245250
death_rate,
246251
.color_by = "none"
247252
) +
248-
facet_grid(.response_name ~ geo_value, scale = "free") +
253+
facet_grid(
254+
rows = vars(.response_name),
255+
cols = vars(geo_value),
256+
scale = "free"
257+
) +
249258
geom_vline(aes(xintercept = forecast_date)) +
250259
geom_text(
251260
data = forecast_date_label,
@@ -273,7 +282,8 @@ four_week_ahead <- arx_forecaster(
273282
predictors = c("case_rate", "death_rate"),
274283
args_list = arx_args_list(
275284
lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
276-
ahead = 4 * 7
285+
ahead = 4 * 7,
286+
quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)
277287
)
278288
)
279289
four_week_ahead
@@ -288,7 +298,7 @@ predicted values (and prediction intervals) for each location 28 days after the
288298
forecast date.
289299
Plotting the prediction intervals on our subset above[^2]:
290300

291-
[^2]: Alternatively, you could call `auto_plot(four_week_ahead)` to get the full
301+
[^2]: Alternatively, you could call `autoplot(four_week_ahead)` to get the full
292302
collection of forecasts. This is too busy for the space we have for plotting
293303
here.
294304

@@ -318,8 +328,15 @@ forecast_plot <-
318328
forecast_plot
319329
```
320330

331+
And as a tibble of quantile level-value pairs:
332+
```{r pivot_wider}
333+
four_week_ahead$predictions |>
334+
select(-.pred) |>
335+
pivot_quantiles_longer(.pred_distn)
336+
```
337+
321338
The black dot gives the median prediction, while the blue intervals give the
322-
25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges.
339+
25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^4].
323340
For this particular day and these locations, the forecasts are relatively
324341
accurate, with the true data being at least within the 10-90% interval.
325342
A couple of things to note:
@@ -334,7 +351,10 @@ A couple of things to note:
334351
## Getting Help
335352
If you encounter a bug or have a feature request, feel free to file an [issue on
336353
our github page](https://github.com/cmu-delphi/epipredict/issues).
337-
For other
338-
questions, feel free to reach out to the authors, either via this [contact
354+
For other questions, feel free to reach out to the authors, either via this
355+
[contact
339356
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
340357
email, or the Insightnet slack.
358+
359+
[^4]: Note that these are not the same quantiles that we fit when creating
360+
`four_week_ahead`. They are extrapolated from those quantiles using `extrapolate_quantiles()` (which assumes an exponential decay in the tails).

README.md

+40-10
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,10 @@ cases_deaths |>
102102
death_rate,
103103
.color_by = "none"
104104
) +
105-
facet_grid(.response_name ~ geo_value, scale = "free") +
105+
facet_grid(
106+
rows = vars(.response_name),
107+
cols = vars(geo_value),
108+
scale = "free") +
106109
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
107110
theme(axis.text.x = element_text(angle = 90, hjust = 1))
108111
```
@@ -171,16 +174,19 @@ forecast_date_label <-
171174
geo_value = rep(used_locations, 2),
172175
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
173176
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
174-
heights = c(rep(150, 4), rep(1.0, 4))
177+
heights = c(rep(150, 4), rep(0.75, 4))
175178
)
176179
processed_data_plot <-
177-
cases_deaths |>
180+
covid_case_death_rates |> filter(geo_value %in% used_locations) |>
178181
autoplot(
179182
case_rate,
180183
death_rate,
181184
.color_by = "none"
182185
) +
183-
facet_grid(.response_name ~ geo_value, scale = "free") +
186+
facet_grid(
187+
rows = vars(.response_name),
188+
cols = vars(geo_value),
189+
scale = "free") +
184190
geom_vline(aes(xintercept = forecast_date)) +
185191
geom_text(
186192
data = forecast_date_label,
@@ -206,13 +212,14 @@ four_week_ahead <- arx_forecaster(
206212
predictors = c("case_rate", "death_rate"),
207213
args_list = arx_args_list(
208214
lags = list(c(0, 1, 2, 3, 7, 14), c(0, 7, 14)),
209-
ahead = 4 * 7
215+
ahead = 4 * 7,
216+
quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)
210217
)
211218
)
212219
four_week_ahead
213220
#> ══ A basic forecaster of type ARX Forecaster ════════════════════════════════
214221
#>
215-
#> This forecaster was fit on 2025-01-27 16:36:10.
222+
#> This forecaster was fit on 2025-01-31 10:46:32.
216223
#>
217224
#> Training data was an <epi_df> with:
218225
#> • Geography: state,
@@ -265,9 +272,27 @@ forecast_plot <-
265272

266273
<img src="man/figures/README-show-single-forecast-1.png" width="90%" style="display: block; margin: auto;" />
267274

275+
And as a tibble of quantile level-value pairs:
276+
277+
``` r
278+
four_week_ahead$predictions |>
279+
select(-.pred) |>
280+
pivot_quantiles_longer(.pred_distn)
281+
#> # A tibble: 20 × 5
282+
#> geo_value values quantile_levels forecast_date target_date
283+
#> <chr> <dbl> <dbl> <date> <date>
284+
#> 1 ca 0.199 0.1 2021-08-01 2021-08-29
285+
#> 2 ca 0.285 0.25 2021-08-01 2021-08-29
286+
#> 3 ca 0.345 0.5 2021-08-01 2021-08-29
287+
#> 4 ca 0.405 0.75 2021-08-01 2021-08-29
288+
#> 5 ca 0.491 0.9 2021-08-01 2021-08-29
289+
#> 6 ma 0.0285 0.1 2021-08-01 2021-08-29
290+
#> # ℹ 14 more rows
291+
```
292+
268293
The black dot gives the median prediction, while the blue intervals give
269-
the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges. For this
270-
particular day and these locations, the forecasts are relatively
294+
the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^4]. For
295+
this particular day and these locations, the forecasts are relatively
271296
accurate, with the true data being at least within the 10-90% interval.
272297
A couple of things to note:
273298

@@ -289,13 +314,18 @@ questions, feel free to reach out to the authors, either via this
289314
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
290315
email, or the Insightnet slack.
291316

292-
[^1]: This makes it so that any given day of the processed timeseries
317+
[^1]: This makes it so that any given day of the processed time-series
293318
only depends on the previous week, which means that we avoid leaking
294319
future values when making a forecast.
295320

296321
[^2]: lagged by 3 in this context meaning using the value from 3 days
297322
ago.
298323

299-
[^3]: Alternatively, you could call `auto_plot(four_week_ahead)` to get
324+
[^3]: Alternatively, you could call `autoplot(four_week_ahead)` to get
300325
the full collection of forecasts. This is too busy for the space we
301326
have for plotting here.
327+
328+
[^4]: Note that these are not the same quantiles that we fit when
329+
creating `four_week_ahead`. They are extrapolated from those
330+
quantiles using `extrapolate_quantiles()` (which assumes an
331+
exponential decay in the tails).

_pkgdown.yml

+2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ articles:
1515
- backtesting
1616
- arx-classifier
1717
- update
18+
- guts
1819
- title: Advanced methods
1920
contents:
2021
- articles/smooth-qr
@@ -47,6 +48,7 @@ reference:
4748
contents:
4849
- contains("args_list")
4950
- contains("_epi_workflow")
51+
5052
- title: Helper functions for Hub submission
5153
contents:
5254
- flusight_hub_formatter

inst/pkgdown-watch.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
rlang::check_installed(c("pkgdown", "servr", "devtools", "here", "cli", "fs"))
1414
library(pkgdown)
1515
pkg <- pkgdown::as_pkgdown(here::here())
16+
devtools::build_readme()
1617
pkgdown::build_articles(pkg)
1718
pkgdown::build_site(pkg, lazy = FALSE, examples = FALSE, devel = TRUE, preview = FALSE)
1819

@@ -55,7 +56,6 @@ servr::httw(
5556
readme <- grep("README[.]rmd$", files, value = TRUE, ignore.case = TRUE)
5657
if (length(readme)) {
5758
devtools::build_readme()
58-
pkgdown::build_home()
5959
pkgdown::build_site(pkg, lazy = TRUE, examples = FALSE, devel = TRUE, preview = FALSE)
6060
}
6161

man/figures/README-date-1.png

260 KB
Loading
834 Bytes
Loading
-190 Bytes
Loading

tests/testthat/test-step_adjust_latency.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
library(dplyr)
22
# Test ideas that were dropped:
3-
# - "epi_adjust_latency works correctly when there's gaps in the timeseries"
3+
# - "epi_adjust_latency works correctly when there's gaps in the time-series"
44
# - "epi_adjust_latency extend_ahead uses the same adjustment when predicting on new data after being baked"
55
# - "`step_adjust_latency` only allows one instance of itself"
66
# - "data with epi_df shorn off works"

0 commit comments

Comments
 (0)