cmu-delphi · rachlobay · Jul 10, 2023 · Jul 10, 2023 · Jul 10, 2023 · Jul 10, 2023
@@ -13,7 +13,7 @@ Serving both populations is the main motivation for our efforts, but at the same
 ## Baseline models
 
 We provide a set of basic, easy-to-use forecasters that work out of the box. 
-You should be able to do a reasonably limited amount of customization on them. Any serious customization happens with the framework discussed below).
+You should be able to do a limited amount of customization on them. Any serious customization happens with the framework discussed below).
 
 For the basic forecasters, we provide: 
 
@@ -214,7 +214,7 @@ out_gb <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
 Or quantile regression, using our custom forecasting engine `quantile_reg()`:
 
 ```{r quantreg, warning = FALSE}
-out_gb <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
+out_qr <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
   quantile_reg())
 ```
 

@@ -1,6 +1,6 @@
 # Introducing the flatline forecaster
 
-The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The predictive intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
+The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The prediction intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
 
 ## Example of using the flatline forecaster
 
@@ -27,19 +27,26 @@ jhu
 
 ### The basic mechanics of the flatline forecaster
 
-The simplest way to create and train a flatline forecaster to predict the d
-eath rate one week into the future, is to input the `epi_df` and the name of 
-the column from it that we want to predict in the `flatline_forecaster` function.
+Suppose that our goal is to predict death rates one week ahead of the last date available for each state. Mathematically, on day $t$, we want to predict new deaths $y$ that are $h$ days ahead at many locations $j$. So for each location, we'll predict
+$$
+\hat{y}_{j, {t+h}} = y_{j, t}
+$$
+where $t$ is 2021-12-31, $h$ is 7 days, and $j$ is the state in our example.
+
+Now, the simplest way to create and train a flatline forecaster to predict the death rate one week into the future is to input the `epi_df` and the name of the column from it that we want to predict in the `flatline_forecaster` function. 
 
 ```{r}
-one_week_ahead <- flatline_forecaster(jhu, outcome = "death_rate")
+one_week_ahead <- flatline_forecaster(jhu, 
+                                      outcome = "death_rate")
 one_week_ahead
 ```
 
-The result is both a fitted model object which could be used any time in the 
-future to create different forecasts, as well as a set of predicted values and
-prediction intervals for each location 7 days after the last available time
-value in the data, which is Dec 31, 2021. Note that 7 days is the default
+The result is a S3 object, which contains three objects - metadata, a tibble of predictions, 
+and a S3 object of class `epi_workflow`. There are a few important things to note here.
+First, the fitted model object contained in the `epi_workflow`
+object could be used any time in the future to create different forecasts. 
+Next, the tibble of predicted values and prediction intervals is for each location 7 days 
+after the last available time value in the data, which is Dec 31, 2021. Note that 7 days is the default
 number of time steps ahead of the forecast date in which forecasts should be
 produced. To change this, you must change the value of the `ahead` parameter
 in the list of additional arguments `flatline_args_list()`. Let's change this
@@ -55,15 +62,16 @@ five_days_ahead <- flatline_forecaster(
 five_days_ahead
 ```
 
-We could also specify that we want a 80% predictive interval by changing the 
-levels. The default 0.05 and 0.95 levels/quantiles give us 90% predictive 
+We could also specify that we want a 80% prediction interval by changing the 
+levels. The default 0.05 and 0.95 levels/quantiles give us 90% prediction 
 interval.
 
 ```{r}
 five_days_ahead <- flatline_forecaster(
   jhu, 
   outcome = "death_rate",
-  flatline_args_list(ahead = 5L, levels = c(0.1, 0.9))
+  flatline_args_list(ahead = 5L, 
+                     levels = c(0.1, 0.9))
 )
 
 five_days_ahead
@@ -77,8 +85,10 @@ five_days_ahead$epi_workflow
 
 The fitted model here was based on minimal pre-processing of the data, 
 estimating a flatline model, and then post-processing the results to be 
-meaningful for epidemiological tasks. To look deeper into the pre-processing, 
-model and processing parts individually, you may use the `$` operator after `epi_workflow`. For example, let's examine the pre-processing part in more detail.
+meaningful for epidemiological tasks. To look deeper into the pre-processing 
+or post-processing parts individually, you can use the `extract_preprocessor()`
+or `extract_frosting()` functions on the `epi_workflow`, respectively. 
+For example, let’s examine the pre-processing part in more detail.
 
 ```{r}
 #| results: false
@@ -94,12 +104,10 @@ extract_preprocessor(five_days_ahead$epi_workflow)
 extract_preprocessor(five_days_ahead$epi_workflow)
 ```
 
-
 Under Operations, we can see that the pre-processing operations were to lead the
 death rate by 5 days (`step_epi_ahead()`) and that the \# of recent observations
 used in the training window were not limited (in `step_training_window()` as
-`n_training = Inf` in `flatline_args_list()`). You should also see the
-molded/pre-processed training data.
+`n_training = Inf` in `flatline_args_list()`).
 
 For symmetry, let's have a look at the post-processing.
 
@@ -116,18 +124,17 @@ extract_frosting(five_days_ahead$epi_workflow)
 extract_frosting(five_days_ahead$epi_workflow)
 ```
 
-
-The post-processing operations in the order the that were performed were to create the predictions and the predictive intervals, add the forecast and target dates and bound the predictions at zero.
+The post-processing operations in the order the that were performed were to create the predictions and the prediction intervals, add the forecast and target dates and bound the predictions at zero.
 
 We can also easily examine the predictions themselves.
 
 ```{r}
 five_days_ahead$predictions
 ```
 
-The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% predictive interval is available for every state (`geo_value`).
+The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 80% prediction interval is available for every state (`geo_value`).
 
-The figure below displays the prediction and prediction interval for three sample states: Arizona, New York, and Florida.
+The figure below displays the prediction and prediction interval for three sample states: Arizona, Florida, and New York.
 
 ```{r}
 #| fig-height: 5
@@ -158,7 +165,7 @@ ggplot(hist, aes(color = geo_value)) +
 
 The vertical black line is the forecast date. Here the forecast seems pretty reasonable based on the past observations shown. In cases where the recent past is highly predictive of the near future, a simple flatline forecast may be respectable, but in more complex situations where there is more uncertainty of what's to come, the flatline forecaster may be best relegated to being a baseline model and nothing more.
 
-Take for example what happens when we consider a wider range of target dates. That is, we will now predict for several different horizons or `ahead` values - in our case, 5 to 25 days ahead, inclusive. Since the flatline forecaster function forecasts at a single unique `ahead` value, we can use the `map()` function from `purrr` to apply the forecaster to each ahead value we want to use. Then, we row bind the list of results.
+Take for example what happens when we consider a wider range of target dates. That is, we will now predict for several different horizons or `ahead` values - in our case, 1 to 28 days ahead, inclusive. Since the flatline forecaster function forecasts at a single unique `ahead` value, we can use the `map()` function from `purrr` to apply the forecaster to each ahead value we want to use. And then we row bind the list of results.
 
 ```{r}
 out_df <- map(1:28, ~ flatline_forecaster(
@@ -168,7 +175,7 @@ out_df <- map(1:28, ~ flatline_forecaster(
   list_rbind()
 ```
 
-Then, we proceed as we did before. The only difference from before is that we're using `out_df` where we had `five_days_ahead$predictions`.
+Then we proceed as we did before. The only difference from before is that we're using `out_df` where we had `five_days_ahead$predictions`.
 
 ```{r}
 #| fig-height: 5
@@ -195,9 +202,9 @@ ggplot(hist) +
   theme(legend.position = "none")
 ```
 
-Now, you can really see the flat line trend in the predictions. And you may also observe that as we get further away from the forecast date, the more unnerving using a flatline prediction becomes. It feels increasingly unnatural.
+Now you can really see the flat line trend in the predictions. And you may also observe that as we get further away from the forecast date, the more unnerving using a flatline prediction becomes. It feels increasingly unnatural.
 
-So naturally the choice of forecaster relates to the time frame being considered. In general, using a flatline forecaster makes more sense for short-term forecasts than for long-term forecasts and for periods of great stability than in less stable times. Realistically, periods of great stability are rare. Moreover, in our model of choice we want to take into account more information about the past than just what happened at the most recent time point. So simple forecasters like the flatline forecaster don't cut it as actual contenders in many real-life situations. However, they are not useless, just used for a different purpose. A simple model is often used to compare a more complex model to, which is why you may have seen such a model used as a baseline in the [COVID Forecast Hub](https://covid19forecasthub.org). The following [blog post](https://delphi.cmu.edu/blog/2021/09/30/on-the-predictability-of-covid-19/#ensemble-forecast-performance) from Delphi explores the Hub's ensemble accuracy relative to such a baseline model.
+So naturally the choice of forecaster relates to the time frame being considered. In general, using a flatline forecaster makes more sense for short-term forecasts than for long-term forecasts and for periods of great stability than in less stable times. Realistically, periods of great stability are rare. And moreover, in our model of choice we want to take into account more information about the past than just what happened at the most recent time point. So simple forecasters like the flatline forecaster don't cut it as actual contenders in many real-life situations. However, they are not useless, just used for a different purpose. A simple model is often used to compare a more complex model to, which is why you may have seen such a model used as a baseline in the [COVID Forecast Hub](https://covid19forecasthub.org). The following [blog post](https://delphi.cmu.edu/blog/2021/09/30/on-the-predictability-of-covid-19/#ensemble-forecast-performance) from Delphi explores the Hub's ensemble accuracy relative to such a baseline model.
 
 ## What we've learned in a nutshell
 

@@ -6,7 +6,7 @@ source("_common.R")
 ```
 
 Underneath the hood, the `arx_forecaster()` (and all our canned
-forecasters) creates (and returns) an `epi_workflow`. 
+forecasters) creates and returns an `epi_workflow`. 
 Essentially, this is a big S3 object that wraps up the 4 modular steps 
 (preprocessing - postprocessing) described in the last chapter.
 
@@ -17,7 +17,7 @@ Essentially, this is a big S3 object that wraps up the 4 modular steps
 
 Let's investigate how these interact with `{tidymodels}` and why it's important
 to think of forecasting this way. To have something to play with, we'll continue
-to examine the data and an estimated canned corecaster.
+to examine the data and an estimated canned forecaster.
 
 
 ```{r demo-workflow}