@@ -301,8 +301,8 @@ growth_rate_recipe <- epi_recipe(
301
301
step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) |>
302
302
step_epi_lag(death_rate, lag = c(0, 7, 14)) |>
303
303
step_epi_ahead(death_rate, ahead = 4 * 7) |>
304
- step_growth_rate(death_rate) |>
305
304
step_epi_naomit() |>
305
+ step_growth_rate(death_rate) |>
306
306
step_training_window()
307
307
```
308
308
@@ -317,7 +317,9 @@ growth_rate_recipe |>
317
317
death_rate, gr_7_rel_change_death_rate
318
318
)
319
319
```
320
+
320
321
And the role:
322
+
321
323
``` {r growth_rate_roles}
322
324
prepped <- growth_rate_recipe |>
323
325
prep(training_data)
@@ -329,7 +331,9 @@ To demonstrate the changes in the layers that come along with it, we will use
329
331
``` {r layer_and_fit}
330
332
growth_rate_layers <- frosting() |>
331
333
layer_predict() |>
332
- layer_quantile_distn(quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)) |>
334
+ layer_quantile_distn(
335
+ quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)
336
+ ) |>
333
337
layer_point_from_distn() |>
334
338
layer_add_forecast_date() |>
335
339
layer_add_target_date() |>
@@ -429,7 +433,121 @@ which are 2-3 orders of magnitude larger than the corresponding rates above.
429
433
while ` rate_rescaling ` gives the denominator of the rate (our fit values were
430
434
per 100,000).
431
435
432
- [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the process of actually baking.
436
+ # Custom classifier workflow
437
+
438
+ As a more complicated example of the kind of pipeline that you can build using
439
+ this framework, here is an example of a hotspot prediction model, which predicts
440
+ whether the case rates are increasing (` up ` ), decreasing (` down ` ) or flat
441
+ (` flat ` ).
442
+ This comes from a paper by McDonald, Bien, Green, Hu et al[ ^ 3 ] , and roughly
443
+ serves as an extension of ` arx_classifier() ` .
444
+
445
+ First, we need to add a factor version of the ` geo_value ` , so that it can
446
+ actually be used as a feature.
447
+
448
+ ``` {r training_factor}
449
+ training_data <-
450
+ training_data %>%
451
+ mutate(geo_value_factor = as.factor(geo_value))
452
+ ```
453
+
454
+ Then we put together the recipe, using a combination of base ` {recipe} `
455
+ functions such as ` add_role() ` and ` step_dummy() ` , and ` {epipreict} ` functions
456
+ such as ` step_growth_rate() ` .
457
+
458
+ ``` {r class_recipe}
459
+ classifier_recipe <- epi_recipe(training_data) %>%
460
+ add_role(time_value, new_role = "predictor") %>%
461
+ step_dummy(geo_value_factor) %>%
462
+ step_growth_rate(case_rate, role = "none", prefix = "gr_") %>%
463
+ step_epi_lag(starts_with("gr_"), lag = c(0, 7, 14)) %>%
464
+ step_epi_ahead(starts_with("gr_"), ahead = 7, role = "none") %>%
465
+ # note recipes::step_cut() has a bug in it, or we could use that here
466
+ step_mutate(
467
+ response = cut(
468
+ ahead_7_gr_7_rel_change_case_rate,
469
+ breaks = c(-Inf, -0.2, 0.25, Inf) / 7, # division gives weekly not daily
470
+ labels = c("down", "flat", "up")
471
+ ),
472
+ role = "outcome"
473
+ ) %>%
474
+ step_rm(has_role("none"), has_role("raw")) %>%
475
+ step_epi_naomit()
476
+ ```
477
+
478
+
479
+ Roughly, this adds as predictors:
480
+
481
+ 1 . the time value (via ` add_role() ` )
482
+ 2 . the ` geo_value ` (via ` step_dummy() ` and the ` as.factor() ` above)
483
+ 3 . the growth rate, both at prediction time and lagged by one and two weeks
484
+
485
+ The outcome is created by composing several steps together: ` step_epi_ahead() `
486
+ creates a column with the growth rate one week into the future, while
487
+ ` step_mutate() ` creates a factor with the 3 values:
488
+
489
+ $$
490
+ Z_{\ell, t}=
491
+ \begin{cases}
492
+ \text{up}, & \text{if}\ Y^{\Delta}_{\ell, t} > 0.25 \\
493
+ \text{down}, & \text{if}\ Y^{\Delta}_{\ell, t} < -0.20\\
494
+ \text{flat}, & \text{otherwise}
495
+ \end{cases}
496
+ $$
497
+
498
+ where $Y^{\Delta}_ {\ell, t}$ is the growth rate at location $\ell$ and time $t$.
499
+ ` up ` means that the ` case_rate ` is has increased by at least 25%, while ` down `
500
+ means it has decreased by at least 20%.
501
+
502
+ Note that both ` step_growth_rate() ` and ` step_epi_ahead() ` assign the role
503
+ ` none ` explicitly; this is because they're used as intermediate steps to create
504
+ both predictors and the outcome.
505
+ ` step_rm() ` drops them after they're done, along with the original ` raw ` columns
506
+ ` death_rate ` and ` case_rate ` (both ` geo_value ` and ` time_value ` are retained
507
+ because their roles have been reassigned).
508
+
509
+
510
+ To fit a classification model like this, we will need to use a parsnip model
511
+ with mode classification; the simplest example is ` multinomial_reg() ` .
512
+ We don't actually need to apply any layers, so we can skip adding one to the ` epiworkflow() ` :
513
+
514
+ ``` {r, warning=FALSE}
515
+ wf <- epi_workflow(
516
+ classifier_recipe,
517
+ multinom_reg()
518
+ ) %>%
519
+ fit(training_data)
520
+
521
+ forecast(wf) %>% filter(!is.na(.pred_class))
522
+ ```
523
+
524
+ And comparing the result with the actual growth rates at that point:
525
+ ``` {r growth_rate_results}
526
+ growth_rates <- covid_case_death_rates |>
527
+ filter(geo_value %in% used_locations) |>
528
+ group_by(geo_value) |>
529
+ mutate(
530
+ # multiply by 7 to get to weekly equivalents
531
+ case_gr = growth_rate(x = time_value, y = case_rate) * 7
532
+ ) |>
533
+ ungroup()
534
+
535
+ growth_rates |> filter(time_value == "2021-08-01")
536
+ ```
537
+
538
+ So they're all increasing at significantly higher than 25% per week (36%-62%),
539
+ which matches the classification.
540
+
541
+
542
+ See the [ tooling book] ( https://cmu-delphi.github.io/delphi-tooling-book/preprocessing-and-models.html ) for a more in depth discussion of this example.
543
+
544
+
545
+ [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the
546
+ process of actually baking.
433
547
434
548
[ ^ 2 ] : Note that the frosting doesn't require any information about the training
435
549
data, since the output of the model only depends on the model used.
550
+
551
+ [ ^ 3 ] : McDonald, Bien, Green, Hu, et al. “Can auxiliary indicators improve
552
+ COVID-19 forecasting and hotspot prediction?.” Proceedings of the National
553
+ Academy of Sciences 118.51 (2021): e2111453118. doi:10.1073/pnas.2111453118
0 commit comments