Skip to content

Edited existing chapters #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _freeze/epipredict/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/flatline-forecaster/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/forecast-framework/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/tidymodels-intro/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/tidymodels-regression/execute-results/html.json

Large diffs are not rendered by default.

626 changes: 318 additions & 308 deletions _freeze/tidymodels-regression/figure-html/unnamed-chunk-21-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
214 changes: 107 additions & 107 deletions _freeze/tidymodels-regression/figure-html/unnamed-chunk-24-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions epipredict.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Serving both populations is the main motivation for our efforts, but at the same
## Baseline models

We provide a set of basic, easy-to-use forecasters that work out of the box.
You should be able to do a reasonably limited amount of customization on them. Any serious customization happens with the framework discussed below).
You should be able to do a limited amount of customization on them. Any serious customization happens with the framework discussed below).

For the basic forecasters, we provide:

Expand Down Expand Up @@ -214,7 +214,7 @@ out_gb <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
Or quantile regression, using our custom forecasting engine `quantile_reg()`:

```{r quantreg, warning = FALSE}
out_gb <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
out_qr <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
quantile_reg())
```

Expand Down
57 changes: 32 additions & 25 deletions flatline-forecaster.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introducing the flatline forecaster

The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The predictive intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The prediction intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).

## Example of using the flatline forecaster

Expand All @@ -27,19 +27,26 @@ jhu

### The basic mechanics of the flatline forecaster

The simplest way to create and train a flatline forecaster to predict the d
eath rate one week into the future, is to input the `epi_df` and the name of
the column from it that we want to predict in the `flatline_forecaster` function.
Suppose that our goal is to predict death rates one week ahead of the last date available for each state. Mathematically, on day $t$, we want to predict new deaths $y$ that are $h$ days ahead at many locations $j$. So for each location, we'll predict
$$
\hat{y}_{j, {t+h}} = y_{j, t}
$$
where $t$ is 2021-12-31, $h$ is 7 days, and $j$ is the state in our example.

Now, the simplest way to create and train a flatline forecaster to predict the death rate one week into the future is to input the `epi_df` and the name of the column from it that we want to predict in the `flatline_forecaster` function.

```{r}
one_week_ahead <- flatline_forecaster(jhu, outcome = "death_rate")
one_week_ahead <- flatline_forecaster(jhu,
outcome = "death_rate")
one_week_ahead
```

The result is both a fitted model object which could be used any time in the
future to create different forecasts, as well as a set of predicted values and
prediction intervals for each location 7 days after the last available time
value in the data, which is Dec 31, 2021. Note that 7 days is the default
The result is a S3 object, which contains three objects - metadata, a tibble of predictions,
and a S3 object of class `epi_workflow`. There are a few important things to note here.
First, the fitted model object contained in the `epi_workflow`
object could be used any time in the future to create different forecasts.
Next, the tibble of predicted values and prediction intervals is for each location 7 days
after the last available time value in the data, which is Dec 31, 2021. Note that 7 days is the default
number of time steps ahead of the forecast date in which forecasts should be
produced. To change this, you must change the value of the `ahead` parameter
in the list of additional arguments `flatline_args_list()`. Let's change this
Expand All @@ -55,15 +62,16 @@ five_days_ahead <- flatline_forecaster(
five_days_ahead
```

We could also specify that we want a 80% predictive interval by changing the
levels. The default 0.05 and 0.95 levels/quantiles give us 90% predictive
We could also specify that we want a 80% prediction interval by changing the
levels. The default 0.05 and 0.95 levels/quantiles give us 90% prediction
interval.

```{r}
five_days_ahead <- flatline_forecaster(
jhu,
outcome = "death_rate",
flatline_args_list(ahead = 5L, levels = c(0.1, 0.9))
flatline_args_list(ahead = 5L,
levels = c(0.1, 0.9))
)

five_days_ahead
Expand All @@ -77,8 +85,10 @@ five_days_ahead$epi_workflow

The fitted model here was based on minimal pre-processing of the data,
estimating a flatline model, and then post-processing the results to be
meaningful for epidemiological tasks. To look deeper into the pre-processing,
model and processing parts individually, you may use the `$` operator after `epi_workflow`. For example, let's examine the pre-processing part in more detail.
meaningful for epidemiological tasks. To look deeper into the pre-processing
or post-processing parts individually, you can use the `extract_preprocessor()`
or `extract_frosting()` functions on the `epi_workflow`, respectively.
For example, let’s examine the pre-processing part in more detail.

```{r}
#| results: false
Expand All @@ -94,12 +104,10 @@ extract_preprocessor(five_days_ahead$epi_workflow)
extract_preprocessor(five_days_ahead$epi_workflow)
```


Under Operations, we can see that the pre-processing operations were to lead the
death rate by 5 days (`step_epi_ahead()`) and that the \# of recent observations
used in the training window were not limited (in `step_training_window()` as
`n_training = Inf` in `flatline_args_list()`). You should also see the
molded/pre-processed training data.
`n_training = Inf` in `flatline_args_list()`).

For symmetry, let's have a look at the post-processing.

Expand All @@ -116,18 +124,17 @@ extract_frosting(five_days_ahead$epi_workflow)
extract_frosting(five_days_ahead$epi_workflow)
```


The post-processing operations in the order the that were performed were to create the predictions and the predictive intervals, add the forecast and target dates and bound the predictions at zero.
The post-processing operations in the order the that were performed were to create the predictions and the prediction intervals, add the forecast and target dates and bound the predictions at zero.

We can also easily examine the predictions themselves.

```{r}
five_days_ahead$predictions
```

The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% predictive interval is available for every state (`geo_value`).
The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 80% prediction interval is available for every state (`geo_value`).

The figure below displays the prediction and prediction interval for three sample states: Arizona, New York, and Florida.
The figure below displays the prediction and prediction interval for three sample states: Arizona, Florida, and New York.

```{r}
#| fig-height: 5
Expand Down Expand Up @@ -158,7 +165,7 @@ ggplot(hist, aes(color = geo_value)) +

The vertical black line is the forecast date. Here the forecast seems pretty reasonable based on the past observations shown. In cases where the recent past is highly predictive of the near future, a simple flatline forecast may be respectable, but in more complex situations where there is more uncertainty of what's to come, the flatline forecaster may be best relegated to being a baseline model and nothing more.

Take for example what happens when we consider a wider range of target dates. That is, we will now predict for several different horizons or `ahead` values - in our case, 5 to 25 days ahead, inclusive. Since the flatline forecaster function forecasts at a single unique `ahead` value, we can use the `map()` function from `purrr` to apply the forecaster to each ahead value we want to use. Then, we row bind the list of results.
Take for example what happens when we consider a wider range of target dates. That is, we will now predict for several different horizons or `ahead` values - in our case, 1 to 28 days ahead, inclusive. Since the flatline forecaster function forecasts at a single unique `ahead` value, we can use the `map()` function from `purrr` to apply the forecaster to each ahead value we want to use. And then we row bind the list of results.

```{r}
out_df <- map(1:28, ~ flatline_forecaster(
Expand All @@ -168,7 +175,7 @@ out_df <- map(1:28, ~ flatline_forecaster(
list_rbind()
```

Then, we proceed as we did before. The only difference from before is that we're using `out_df` where we had `five_days_ahead$predictions`.
Then we proceed as we did before. The only difference from before is that we're using `out_df` where we had `five_days_ahead$predictions`.

```{r}
#| fig-height: 5
Expand All @@ -195,9 +202,9 @@ ggplot(hist) +
theme(legend.position = "none")
```

Now, you can really see the flat line trend in the predictions. And you may also observe that as we get further away from the forecast date, the more unnerving using a flatline prediction becomes. It feels increasingly unnatural.
Now you can really see the flat line trend in the predictions. And you may also observe that as we get further away from the forecast date, the more unnerving using a flatline prediction becomes. It feels increasingly unnatural.

So naturally the choice of forecaster relates to the time frame being considered. In general, using a flatline forecaster makes more sense for short-term forecasts than for long-term forecasts and for periods of great stability than in less stable times. Realistically, periods of great stability are rare. Moreover, in our model of choice we want to take into account more information about the past than just what happened at the most recent time point. So simple forecasters like the flatline forecaster don't cut it as actual contenders in many real-life situations. However, they are not useless, just used for a different purpose. A simple model is often used to compare a more complex model to, which is why you may have seen such a model used as a baseline in the [COVID Forecast Hub](https://covid19forecasthub.org). The following [blog post](https://delphi.cmu.edu/blog/2021/09/30/on-the-predictability-of-covid-19/#ensemble-forecast-performance) from Delphi explores the Hub's ensemble accuracy relative to such a baseline model.
So naturally the choice of forecaster relates to the time frame being considered. In general, using a flatline forecaster makes more sense for short-term forecasts than for long-term forecasts and for periods of great stability than in less stable times. Realistically, periods of great stability are rare. And moreover, in our model of choice we want to take into account more information about the past than just what happened at the most recent time point. So simple forecasters like the flatline forecaster don't cut it as actual contenders in many real-life situations. However, they are not useless, just used for a different purpose. A simple model is often used to compare a more complex model to, which is why you may have seen such a model used as a baseline in the [COVID Forecast Hub](https://covid19forecasthub.org). The following [blog post](https://delphi.cmu.edu/blog/2021/09/30/on-the-predictability-of-covid-19/#ensemble-forecast-performance) from Delphi explores the Hub's ensemble accuracy relative to such a baseline model.

## What we've learned in a nutshell

Expand Down
4 changes: 2 additions & 2 deletions forecast-framework.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ source("_common.R")
```

Underneath the hood, the `arx_forecaster()` (and all our canned
forecasters) creates (and returns) an `epi_workflow`.
forecasters) creates and returns an `epi_workflow`.
Essentially, this is a big S3 object that wraps up the 4 modular steps
(preprocessing - postprocessing) described in the last chapter.

Expand All @@ -17,7 +17,7 @@ Essentially, this is a big S3 object that wraps up the 4 modular steps

Let's investigate how these interact with `{tidymodels}` and why it's important
to think of forecasting this way. To have something to play with, we'll continue
to examine the data and an estimated canned corecaster.
to examine the data and an estimated canned forecaster.


```{r demo-workflow}
Expand Down
Loading