diff --git a/DESCRIPTION b/DESCRIPTION index 81f1871e..e77f331a 100755 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -4,7 +4,7 @@ Title: Tools for basic signal processing in epidemiology Version: 0.8.5 Authors@R: c( person("Jacob", "Bien", role = "ctb"), - person("Logan", "Brooks", email = "lcbrooks@andrew.cmu.edu", role = c("aut", "cre")), + person("Logan", "Brooks", , "lcbrooks@andrew.cmu.edu", role = c("aut", "cre")), person("Rafael", "Catoia", role = "ctb"), person("Nat", "DeFries", role = "ctb"), person("Daniel", "McDonald", role = "aut"), @@ -15,16 +15,22 @@ Authors@R: c( person("Evan", "Ray", role = "aut"), person("Dmitry", "Shemetov", role = "ctb"), person("Ryan", "Tibshirani", role = "aut"), - person("Lionel", "Henry", role = "ctb", comment = "Author of included rlang fragments"), - person("Hadley", "Wickham", role = "ctb", comment = "Author of included rlang fragments"), - person("Posit", role = "cph", comment = "Copyright holder of included rlang fragments") + person("Lionel", "Henry", role = "ctb", + comment = "Author of included rlang fragments"), + person("Hadley", "Wickham", role = "ctb", + comment = "Author of included rlang fragments"), + person("Posit", role = "cph", + comment = "Copyright holder of included rlang fragments") ) -Description: This package introduces a common data structure for epidemiological - data reported by location and time, provides another data structure to - work with revisions to these data sets over time, and offers associated - utilities to perform basic signal processing tasks. +Description: This package introduces a common data structure for + epidemiological data reported by location and time, provides another + data structure to work with revisions to these data sets over time, + and offers associated utilities to perform basic signal processing + tasks. License: MIT + file LICENSE -Copyright: file inst/COPYRIGHTS +URL: https://cmu-delphi.github.io/epiprocess/ +Depends: + R (>= 3.6) Imports: checkmate, cli, @@ -58,18 +64,16 @@ VignetteBuilder: knitr Remotes: cmu-delphi/epidatr, - reconverse/outbreaks, - glmgen/genlasso + glmgen/genlasso, + reconverse/outbreaks Config/testthat/edition: 3 Config/testthat/parallel: true +Copyright: file inst/COPYRIGHTS Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.2 -Depends: - R (>= 2.10) -URL: https://cmu-delphi.github.io/epiprocess/ -Collate: +Collate: 'archive.R' 'autoplot.R' 'correlation.R' diff --git a/NEWS.md b/NEWS.md index 5b231126..8bdb8c47 100644 --- a/NEWS.md +++ b/NEWS.md @@ -5,7 +5,8 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat # epiprocess 0.9 ## Breaking changes -- In `epi[x]_slide`: + +- In `epi[x]_slide` - `names_sep` is deprecated, and if you return data frames from your computations, they will no longer be unpacked into separate columns with name prefixes; instead: @@ -15,12 +16,18 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat packed data.frame-class column (see `tidyr::pack`). - `as_list_col` is deprecated; you can now directly return a list from your slide computations instead. +- `additional_metadata` is no longer accepted in `as_epi_df()` or + `as_epi_archive()`. Use the new `other_keys` arg to specify additional key + columns, such as age group columns or other demographic breakdowns. + Miscellaneous metadata are no longer handled by `epiprocess`, but you can use + R's built-in `attr<-` instead for a similar feature. ## Improvements - Added `complete.epi_df`, which fills in missing values in an `epi_df` with `NA`s. Uses `tidyr::complete` underneath and preserves `epi_df` metadata. -- Inclusion of the function `revision_summary` to provide basic revision information for `epi_archive`s out of the box. (#492) +- Inclusion of the function `revision_summary` to provide basic revision + information for `epi_archive`s out of the box. (#492) ## Bug fixes @@ -87,8 +94,8 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat - Multiple "data-masking" tidy evaluation expressions can be passed in via `...`, rather than just one. - Additional tidy evaluation features from `dplyr::mutate` are supported: `!! - name_var := value`, unnamed expressions evaluating to data frames, and `= - NULL`; see `?epi_slide` for more details. +name_var := value`, unnamed expressions evaluating to data frames, and `= +NULL`; see `?epi_slide` for more details. ## Cleanup diff --git a/R/archive.R b/R/archive.R index eb66e364..fbcc3c36 100644 --- a/R/archive.R +++ b/R/archive.R @@ -179,7 +179,8 @@ NULL #' #' * `geo_type`: the type for the geo values. #' * `time_type`: the type for the time values. -#' * `additional_metadata`: list of additional metadata for the data archive. +#' * `other_keys`: any additional keys as a character vector. +#' Typical examples are "age" or sub-geographies. #' #' While this metadata is not protected, it is generally recommended to treat it #' as read-only, and to use the `epi_archive` methods to interact with the data @@ -209,10 +210,8 @@ NULL #' if the time type is not recognized. #' @param other_keys Character vector specifying the names of variables in `x` #' that should be considered key variables (in the language of `data.table`) -#' apart from "geo_value", "time_value", and "version". -#' @param additional_metadata List of additional metadata to attach to the -#' `epi_archive` object. The metadata will have the `geo_type` field; named -#' entries from the passed list or will be included as well. +#' apart from "geo_value", "time_value", and "version". Typical examples +#' are "age" or more granular geographies. #' @param compactify Optional; Boolean. `TRUE` will remove some #' redundant rows, `FALSE` will not, and missing or `NULL` will remove #' redundant rows, but issue a warning. See more information at `compactify`. @@ -293,7 +292,6 @@ new_epi_archive <- function( geo_type, time_type, other_keys, - additional_metadata, compactify, clobberable_versions_start, versions_end, @@ -350,7 +348,7 @@ new_epi_archive <- function( DT = compactified, geo_type = geo_type, time_type = time_type, - additional_metadata = additional_metadata, + other_keys = other_keys, clobberable_versions_start = clobberable_versions_start, versions_end = versions_end ), @@ -423,7 +421,6 @@ is_locf <- function(vec, tolerance) { # nolint: object_usage_linter validate_epi_archive <- function( x, other_keys, - additional_metadata, compactify, clobberable_versions_start, versions_end) { @@ -434,9 +431,6 @@ validate_epi_archive <- function( if (any(c("geo_value", "time_value", "version") %in% other_keys)) { cli_abort("`other_keys` cannot contain \"geo_value\", \"time_value\", or \"version\".") } - if (any(names(additional_metadata) %in% c("geo_type", "time_type"))) { - cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\" or \"time_type\".") - } # Conduct checks and apply defaults for `compactify` assert_logical(compactify, len = 1, any.missing = FALSE, null.ok = TRUE) @@ -485,8 +479,7 @@ as_epi_archive <- function( x, geo_type = deprecated(), time_type = deprecated(), - other_keys = character(0L), - additional_metadata = list(), + other_keys = character(), compactify = NULL, clobberable_versions_start = NA, .versions_end = max_version_with_row_in(x), ..., @@ -518,11 +511,10 @@ as_epi_archive <- function( time_type <- guess_time_type(x$time_value) validate_epi_archive( - x, other_keys, additional_metadata, - compactify, clobberable_versions_start, versions_end + x, other_keys, compactify, clobberable_versions_start, versions_end ) new_epi_archive( - x, geo_type, time_type, other_keys, additional_metadata, + x, geo_type, time_type, other_keys, compactify, clobberable_versions_start, versions_end ) } @@ -551,7 +543,7 @@ print.epi_archive <- function(x, ..., class = TRUE, methods = TRUE) { c( ">" = if (class) "An `epi_archive` object, with metadata:", "i" = if (length(setdiff(key(x$DT), c("geo_value", "time_value", "version"))) > 0) { - "Non-standard DT keys: {setdiff(key(x$DT), c('geo_value', 'time_value', 'version'))}" + "Other DT keys: {setdiff(key(x$DT), c('geo_value', 'time_value', 'version'))}" }, "i" = if (nrow(x$DT) != 0L) { "Min/max time values: {min(x$DT$time_value)} / {max(x$DT$time_value)}" @@ -687,7 +679,8 @@ print.epi_archive <- function(x, ..., class = TRUE, methods = TRUE) { #' @export #' #' @aliases grouped_epi_archive -group_by.epi_archive <- function(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) { +group_by.epi_archive <- function(.data, ..., .add = FALSE, + .drop = dplyr::group_by_drop_default(.data)) { # `add` makes no difference; this is an ungrouped `epi_archive`. detailed_mutate <- epix_detailed_restricted_mutate(.data, ...) assert_logical(.drop) diff --git a/R/epi_df.R b/R/epi_df.R index 56424bf0..fedcff55 100644 --- a/R/epi_df.R +++ b/R/epi_df.R @@ -127,7 +127,7 @@ #' dplyr::rename(geo_value = state, time_value = reported_date) %>% #' as_epi_df( #' as_of = "2020-06-03", -#' additional_metadata = list(other_keys = "pol") +#' other_keys = "pol" #' ) #' #' attr(ex2, "metadata") @@ -146,47 +146,46 @@ #' state = rep("MA", 6), #' pol = rep(c("blue", "swing", "swing"), each = 2) #' ) %>% -#' # the 2 extra keys we added have to be specified in the other_keys -#' # component of additional_metadata. -#' as_epi_df(additional_metadata = list(other_keys = c("state", "pol"))) +#' as_epi_df(other_keys = c("state", "pol")) #' #' attr(ex3, "metadata") NULL -#' Create an `epi_df` object -#' -#' @rdname epi_df -#' @param geo_type DEPRECATED Has no effect. Geo value type is inferred from the -#' location column and set to "custom" if not recognized. -#' @param time_type DEPRECATED Has no effect. Time value type inferred from the time -#' column and set to "custom" if not recognized. Unpredictable behavior may result -#' if the time type is not recognized. +#' @describeIn epi_df Lower-level constructor for `epi_df` object +#' @order 2 +#' @param geo_type `r lifecycle::badge("deprecated")` in `as_epi_df()`, has no +#' effect; the geo value type is inferred from the location column and set to +#' "custom" if not recognized. In `new_epi_df()`, should be set to the same +#' value that would be inferred. +#' @param time_type `r lifecycle::badge("deprecated")` in `as_epi_df()`, has no +#' effect: the time value type inferred from the time column and set to +#' "custom" if not recognized. Unpredictable behavior may result if the time +#' type is not recognized. In `new_epi_df()`, should be set to the same value +#' that would be inferred. #' @param as_of Time value representing the time at which the given data were #' available. For example, if `as_of` is January 31, 2022, then the `epi_df` #' object that is created would represent the most up-to-date version of the #' data available as of January 31, 2022. If the `as_of` argument is missing, #' then the current day-time will be used. -#' @param additional_metadata List of additional metadata to attach to the -#' `epi_df` object. The metadata will have `geo_type`, `time_type`, and -#' `as_of` fields; named entries from the passed list will be included as -#' well. If your tibble has additional keys, be sure to specify them as a -#' character vector in the `other_keys` component of `additional_metadata`. +#' @param other_keys If your tibble has additional keys, be sure to specify them +#' as a character vector here (typical examples are "age" or sub-geographies). #' @param ... Additional arguments passed to methods. #' @return An `epi_df` object. #' #' @export -new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of, - additional_metadata = list()) { +new_epi_df <- function(x = tibble::tibble(geo_value = character(), time_value = as.Date(integer())), + geo_type, time_type, as_of, + other_keys = character(), ...) { # Define metadata fields metadata <- list() metadata$geo_type <- geo_type metadata$time_type <- time_type metadata$as_of <- as_of - metadata <- c(metadata, additional_metadata) + metadata$other_keys <- other_keys # Reorder columns (geo_value, time_value, ...) if (sum(dim(x)) != 0) { - cols_to_put_first <- c("geo_value", "time_value") + cols_to_put_first <- c("geo_value", "time_value", other_keys) x <- x[, c( cols_to_put_first, # All other columns @@ -200,7 +199,8 @@ new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of, return(x) } -#' @rdname epi_df +#' @describeIn epi_df The preferred way of constructing `epi_df`s +#' @order 1 #' @param x An `epi_df`, `data.frame`, [tibble::tibble], or [tsibble::tsibble] #' to be converted #' @param ... used for specifying column names, as in [`dplyr::rename`]. For @@ -211,6 +211,7 @@ as_epi_df <- function(x, ...) { } #' @rdname epi_df +#' @order 1 #' @method as_epi_df epi_df #' @export as_epi_df.epi_df <- function(x, ...) { @@ -218,17 +219,18 @@ as_epi_df.epi_df <- function(x, ...) { } #' @rdname epi_df -#' @method as_epi_df tbl_df +#' @order 1 #' @importFrom rlang .data #' @importFrom tidyselect any_of #' @importFrom cli cli_inform +#' @method as_epi_df tbl_df #' @export as_epi_df.tbl_df <- function( x, geo_type = deprecated(), time_type = deprecated(), as_of, - additional_metadata = list(), + other_keys = character(), ...) { # possible standard substitutions for time_value x <- rename(x, ...) @@ -274,29 +276,28 @@ as_epi_df.tbl_df <- function( } # Use the current day-time } - assert_list(additional_metadata) - additional_metadata[["other_keys"]] <- additional_metadata[["other_keys"]] %||% character(0L) - new_epi_df(x, geo_type, time_type, as_of, additional_metadata) + assert_character(other_keys) + new_epi_df(x, geo_type, time_type, as_of, other_keys) } -#' @method as_epi_df data.frame #' @rdname epi_df +#' @order 1 +#' @method as_epi_df data.frame #' @export -as_epi_df.data.frame <- function(x, as_of, additional_metadata = list(), ...) { - as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, additional_metadata = additional_metadata, ...) +as_epi_df.data.frame <- function(x, as_of, other_keys = character(), ...) { + as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, other_keys = other_keys, ...) } -#' @method as_epi_df tbl_ts #' @rdname epi_df +#' @order 1 +#' @method as_epi_df tbl_ts #' @export -as_epi_df.tbl_ts <- function(x, as_of, additional_metadata = list(), ...) { +as_epi_df.tbl_ts <- function(x, as_of, other_keys = character(), ...) { tsibble_other_keys <- setdiff(tsibble::key_vars(x), "geo_value") - if (length(tsibble_other_keys) != 0) { - additional_metadata$other_keys <- unique( - c(additional_metadata$other_keys, tsibble_other_keys) - ) + if (length(tsibble_other_keys) > 0) { + other_keys <- unique(c(other_keys, tsibble_other_keys)) } - as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, additional_metadata = additional_metadata, ...) + as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, other_keys = other_keys, ...) } #' Test for `epi_df` format diff --git a/R/methods-epi_archive.R b/R/methods-epi_archive.R index 96d14d9b..dae7b243 100644 --- a/R/methods-epi_archive.R +++ b/R/methods-epi_archive.R @@ -65,7 +65,6 @@ epix_as_of <- function(x, max_version, min_time_value = -Inf, all_versions = FAL key(x$DT), c("geo_value", "time_value", "version") ) - if (length(other_keys) == 0) other_keys <- NULL # Check a few things on max_version if (!identical(class(max_version), class(x$DT$version))) { @@ -112,10 +111,7 @@ epix_as_of <- function(x, max_version, min_time_value = -Inf, all_versions = FAL dplyr::select(-"version") %>% as_epi_df( as_of = max_version, - additional_metadata = c( - x$additional_metadata, - list(other_keys = other_keys) - ) + other_keys = other_keys ) return(as_of_epi_df) @@ -240,9 +236,8 @@ epix_fill_through_version <- function(x, fill_versions_end, #' Default here is `TRUE`. #' @return the resulting `epi_archive` #' -#' @details In all cases, `additional_metadata` will be an empty list, and -#' `clobberable_versions_start` will be set to the earliest version that could -#' be clobbered in either input archive. +#' @details In all cases, `clobberable_versions_start` will be set to the +#' earliest version that could be clobbered in either input archive. #' #' @examples #' # Example 1 @@ -331,18 +326,6 @@ epix_merge <- function(x, y, cli_abort("`x` and `y` must share data type on their `time_value` column.") } - if (length(x$additional_metadata) != 0L) { - cli_warn("x$additional_metadata won't appear in merge result", - class = "epiprocess__epix_merge_ignores_additional_metadata" - ) - } - if (length(y$additional_metadata) != 0L) { - cli_warn("y$additional_metadata won't appear in merge result", - class = "epiprocess__epix_merge_ignores_additional_metadata" - ) - } - result_additional_metadata <- list() - result_clobberable_versions_start <- if (all(is.na(c(x$clobberable_versions_start, y$clobberable_versions_start)))) { NA # (any type of NA is fine here) @@ -508,7 +491,6 @@ epix_merge <- function(x, y, return(as_epi_archive( result_dt[], # clear data.table internal invisibility flag if set other_keys = setdiff(key(result_dt), c("geo_value", "time_value", "version")), - additional_metadata = result_additional_metadata, # It'd probably be better to pre-compactify before the merge, and might be # guaranteed not to be necessary to compactify the merge result if the # inputs are already compactified, but at time of writing we don't have diff --git a/R/methods-epi_df.R b/R/methods-epi_df.R index 1876ab46..4e74fd1c 100644 --- a/R/methods-epi_df.R +++ b/R/methods-epi_df.R @@ -63,6 +63,10 @@ print.epi_df <- function(x, ...) { ) cat(sprintf("* %-9s = %s\n", "geo_type", attributes(x)$metadata$geo_type)) cat(sprintf("* %-9s = %s\n", "time_type", attributes(x)$metadata$time_type)) + ok <- attributes(x)$metadata$other_keys + if (length(ok) > 0) { + cat(sprintf("* %-9s = %s\n", "other_keys", paste(ok, collapse = ", "))) + } cat(sprintf("* %-9s = %s\n", "as_of", attributes(x)$metadata$as_of)) # Conditional output (silent if attribute is NULL): cat(sprintf("* %-9s = %s\n", "decay_to_tibble", attr(x, "decay_to_tibble"))) @@ -86,6 +90,10 @@ print.epi_df <- function(x, ...) { summary.epi_df <- function(object, ...) { cat("An `epi_df` x, with metadata:\n") cat(sprintf("* %-9s = %s\n", "geo_type", attributes(object)$metadata$geo_type)) + ok <- attributes(object)$metadata$other_keys + if (length(ok) > 0) { + cat(sprintf("* %-9s = %s\n", "other_keys", paste(ok, collapse = ", "))) + } cat(sprintf("* %-9s = %s\n", "as_of", attributes(object)$metadata$as_of)) cat("----------\n") cat(sprintf("* %-27s = %s\n", "min time value", min(object$time_value))) @@ -206,12 +214,13 @@ dplyr_row_slice.epi_df <- function(data, i, ...) { `names<-.epi_df` <- function(x, value) { old_names <- names(x) old_metadata <- attr(x, "metadata") - old_other_keys <- old_metadata[["other_keys"]] - new_other_keys <- value[match(old_other_keys, old_names)] new_metadata <- old_metadata - new_metadata[["other_keys"]] <- new_other_keys + old_other_keys <- old_metadata[["other_keys"]] + if (!is.null(old_other_keys)) { + new_other_keys <- value[match(old_other_keys, old_names)] + new_metadata[["other_keys"]] <- new_other_keys + } result <- reclass(NextMethod(), new_metadata) - # decay to non-`epi_df` if needed: dplyr::dplyr_reconstruct(result, result) } diff --git a/R/slide.R b/R/slide.R index 91cebd2b..1bbd04b7 100644 --- a/R/slide.R +++ b/R/slide.R @@ -419,7 +419,7 @@ epi_slide <- function( #' ungroup() epi_slide_opt <- function( .x, .col_names, .f, ..., - .window_size = 0, .align = c("right", "center", "left"), + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE) { assert_class(.x, "epi_df") @@ -745,7 +745,7 @@ epi_slide_opt <- function( #' ungroup() epi_slide_mean <- function( .x, .col_names, ..., - .window_size = 0, .align = c("right", "center", "left"), + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE) { # Argument deprecation handling provided_args <- rlang::call_args_names(rlang::call_match()) @@ -828,7 +828,7 @@ epi_slide_mean <- function( #' ungroup() epi_slide_sum <- function( .x, .col_names, ..., - .window_size = 0, .align = c("right", "center", "left"), + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE) { # Argument deprecation handling provided_args <- rlang::call_args_names(rlang::call_match()) diff --git a/R/sysdata.rda b/R/sysdata.rda index d100711d..8e8dc5ff 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/data/incidence_num_outlier_example.rda b/data/incidence_num_outlier_example.rda index e898b5ea..96288982 100644 Binary files a/data/incidence_num_outlier_example.rda and b/data/incidence_num_outlier_example.rda differ diff --git a/data/jhu_csse_county_level_subset.rda b/data/jhu_csse_county_level_subset.rda index aca0983d..bc31b493 100644 Binary files a/data/jhu_csse_county_level_subset.rda and b/data/jhu_csse_county_level_subset.rda differ diff --git a/data/jhu_csse_daily_subset.rda b/data/jhu_csse_daily_subset.rda index 12fd5f15..e4dbdc9f 100644 Binary files a/data/jhu_csse_daily_subset.rda and b/data/jhu_csse_daily_subset.rda differ diff --git a/man/epi_archive.Rd b/man/epi_archive.Rd index dee4cbaf..a5055f4e 100644 --- a/man/epi_archive.Rd +++ b/man/epi_archive.Rd @@ -12,7 +12,6 @@ new_epi_archive( geo_type, time_type, other_keys, - additional_metadata, compactify, clobberable_versions_start, versions_end, @@ -22,7 +21,6 @@ new_epi_archive( validate_epi_archive( x, other_keys, - additional_metadata, compactify, clobberable_versions_start, versions_end @@ -32,8 +30,7 @@ as_epi_archive( x, geo_type = deprecated(), time_type = deprecated(), - other_keys = character(0L), - additional_metadata = list(), + other_keys = character(), compactify = NULL, clobberable_versions_start = NA, .versions_end = max_version_with_row_in(x), @@ -54,11 +51,8 @@ if the time type is not recognized.} \item{other_keys}{Character vector specifying the names of variables in \code{x} that should be considered key variables (in the language of \code{data.table}) -apart from "geo_value", "time_value", and "version".} - -\item{additional_metadata}{List of additional metadata to attach to the -\code{epi_archive} object. The metadata will have the \code{geo_type} field; named -entries from the passed list or will be included as well.} +apart from "geo_value", "time_value", and "version". Typical examples +are "age" or more granular geographies.} \item{compactify}{Optional; Boolean. \code{TRUE} will remove some redundant rows, \code{FALSE} will not, and missing or \code{NULL} will remove @@ -136,7 +130,8 @@ object: \itemize{ \item \code{geo_type}: the type for the geo values. \item \code{time_type}: the type for the time values. -\item \code{additional_metadata}: list of additional metadata for the data archive. +\item \code{other_keys}: any additional keys as a character vector. +Typical examples are "age" or sub-geographies. } While this metadata is not protected, it is generally recommended to treat it diff --git a/man/epi_df.Rd b/man/epi_df.Rd index dcda0872..38f923c5 100644 --- a/man/epi_df.Rd +++ b/man/epi_df.Rd @@ -1,23 +1,15 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/epi_df.R -\name{epi_df} -\alias{epi_df} -\alias{new_epi_df} +\name{as_epi_df} \alias{as_epi_df} \alias{as_epi_df.epi_df} \alias{as_epi_df.tbl_df} \alias{as_epi_df.data.frame} \alias{as_epi_df.tbl_ts} +\alias{new_epi_df} +\alias{epi_df} \title{\code{epi_df} object} \usage{ -new_epi_df( - x = tibble::tibble(), - geo_type, - time_type, - as_of, - additional_metadata = list() -) - as_epi_df(x, ...) \method{as_epi_df}{epi_df}(x, ...) @@ -27,24 +19,39 @@ as_epi_df(x, ...) geo_type = deprecated(), time_type = deprecated(), as_of, - additional_metadata = list(), + other_keys = character(), ... ) -\method{as_epi_df}{data.frame}(x, as_of, additional_metadata = list(), ...) +\method{as_epi_df}{data.frame}(x, as_of, other_keys = character(), ...) + +\method{as_epi_df}{tbl_ts}(x, as_of, other_keys = character(), ...) -\method{as_epi_df}{tbl_ts}(x, as_of, additional_metadata = list(), ...) +new_epi_df( + x = tibble::tibble(geo_value = character(), time_value = as.Date(integer())), + geo_type, + time_type, + as_of, + other_keys = character(), + ... +) } \arguments{ \item{x}{An \code{epi_df}, \code{data.frame}, \link[tibble:tibble]{tibble::tibble}, or \link[tsibble:tsibble]{tsibble::tsibble} to be converted} -\item{geo_type}{DEPRECATED Has no effect. Geo value type is inferred from the -location column and set to "custom" if not recognized.} +\item{...}{Additional arguments passed to methods.} -\item{time_type}{DEPRECATED Has no effect. Time value type inferred from the time -column and set to "custom" if not recognized. Unpredictable behavior may result -if the time type is not recognized.} +\item{geo_type}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} in \code{as_epi_df()}, has no +effect; the geo value type is inferred from the location column and set to +"custom" if not recognized. In \code{new_epi_df()}, should be set to the same +value that would be inferred.} + +\item{time_type}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} in \code{as_epi_df()}, has no +effect: the time value type inferred from the time column and set to +"custom" if not recognized. Unpredictable behavior may result if the time +type is not recognized. In \code{new_epi_df()}, should be set to the same value +that would be inferred.} \item{as_of}{Time value representing the time at which the given data were available. For example, if \code{as_of} is January 31, 2022, then the \code{epi_df} @@ -52,14 +59,8 @@ object that is created would represent the most up-to-date version of the data available as of January 31, 2022. If the \code{as_of} argument is missing, then the current day-time will be used.} -\item{additional_metadata}{List of additional metadata to attach to the -\code{epi_df} object. The metadata will have \code{geo_type}, \code{time_type}, and -\code{as_of} fields; named entries from the passed list will be included as -well. If your tibble has additional keys, be sure to specify them as a -character vector in the \code{other_keys} component of \code{additional_metadata}.} - -\item{...}{used for specifying column names, as in \code{\link[dplyr:rename]{dplyr::rename}}. For -example, \verb{geo_value = STATEFP, time_value = end_date}.} +\item{other_keys}{If your tibble has additional keys, be sure to specify them +as a character vector here (typical examples are "age" or sub-geographies).} } \value{ An \code{epi_df} object. @@ -117,6 +118,13 @@ data versioning works in the \code{epiprocess} package (including how to generate \code{epi_df} objects, as data snapshots, from an \code{epi_archive} object). } +\section{Functions}{ +\itemize{ +\item \code{as_epi_df()}: The preferred way of constructing \code{epi_df}s + +\item \code{new_epi_df()}: Lower-level constructor for \code{epi_df} object + +}} \section{Geo Types}{ The following geo types are recognized in an \code{epi_df}. @@ -200,7 +208,7 @@ ex2 <- ex2_input \%>\% dplyr::rename(geo_value = state, time_value = reported_date) \%>\% as_epi_df( as_of = "2020-06-03", - additional_metadata = list(other_keys = "pol") + other_keys = "pol" ) attr(ex2, "metadata") @@ -219,9 +227,7 @@ ex3 <- ex3_input \%>\% state = rep("MA", 6), pol = rep(c("blue", "swing", "swing"), each = 2) ) \%>\% - # the 2 extra keys we added have to be specified in the other_keys - # component of additional_metadata. - as_epi_df(additional_metadata = list(other_keys = c("state", "pol"))) + as_epi_df(other_keys = c("state", "pol")) attr(ex3, "metadata") } diff --git a/man/epi_slide_mean.Rd b/man/epi_slide_mean.Rd index 820292ad..09faefb6 100644 --- a/man/epi_slide_mean.Rd +++ b/man/epi_slide_mean.Rd @@ -8,7 +8,7 @@ epi_slide_mean( .x, .col_names, ..., - .window_size = 0, + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE diff --git a/man/epi_slide_opt.Rd b/man/epi_slide_opt.Rd index 7fc54b7e..dcaab3f8 100644 --- a/man/epi_slide_opt.Rd +++ b/man/epi_slide_opt.Rd @@ -10,7 +10,7 @@ epi_slide_opt( .col_names, .f, ..., - .window_size = 0, + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE diff --git a/man/epi_slide_sum.Rd b/man/epi_slide_sum.Rd index 3c7baedc..0c83c432 100644 --- a/man/epi_slide_sum.Rd +++ b/man/epi_slide_sum.Rd @@ -8,7 +8,7 @@ epi_slide_sum( .x, .col_names, ..., - .window_size = 0, + .window_size = 1, .align = c("right", "center", "left"), .ref_time_values = NULL, .all_rows = FALSE diff --git a/man/epix_merge.Rd b/man/epix_merge.Rd index 43f53c33..564a1fdc 100644 --- a/man/epix_merge.Rd +++ b/man/epix_merge.Rd @@ -46,9 +46,8 @@ clobberable versions). If the \code{versions_end} values differ, the \code{sync} parameter controls what is done. } \details{ -In all cases, \code{additional_metadata} will be an empty list, and -\code{clobberable_versions_start} will be set to the earliest version that could -be clobbered in either input archive. +In all cases, \code{clobberable_versions_start} will be set to the +earliest version that could be clobbered in either input archive. } \examples{ # Example 1 diff --git a/tests/testthat/test-archive.R b/tests/testthat/test-archive.R index 7f20ddeb..1791d870 100644 --- a/tests/testthat/test-archive.R +++ b/tests/testthat/test-archive.R @@ -77,14 +77,6 @@ test_that("other_keys cannot contain names geo_value, time_value or version", { ) }) -test_that("Warning thrown when other_metadata contains overlapping names with geo_type field", { - expect_warning(as_epi_archive(archive_data, additional_metadata = list(geo_type = 1), compactify = FALSE), - regexp = "`additional_metadata` names overlap with existing metadata fields" - ) - expect_warning(as_epi_archive(archive_data, additional_metadata = list(time_type = 1), compactify = FALSE), - regexp = "`additional_metadata` names overlap with existing metadata fields" - ) -}) test_that("epi_archives are correctly instantiated with a variety of data types", { d <- as.Date("2020-01-01") @@ -98,22 +90,22 @@ test_that("epi_archives are correctly instantiated with a variety of data types" ea1 <- as_epi_archive(df, compactify = FALSE) expect_equal(key(ea1$DT), c("geo_value", "time_value", "version")) - expect_equal(ea1$additional_metadata, list()) + expect_null(ea1$additional_metadata) - ea2 <- as_epi_archive(df, other_keys = "value", additional_metadata = list(value = df$value), compactify = FALSE) + ea2 <- as_epi_archive(df, other_keys = "value", compactify = FALSE) expect_equal(key(ea2$DT), c("geo_value", "time_value", "value", "version")) - expect_equal(ea2$additional_metadata, list(value = df$value)) + expect_null(ea2$additional_metadata) # Tibble tib <- tibble::tibble(df, code = "x") ea3 <- as_epi_archive(tib, compactify = FALSE) expect_equal(key(ea3$DT), c("geo_value", "time_value", "version")) - expect_equal(ea3$additional_metadata, list()) + expect_null(ea3$additional_metadata) - ea4 <- as_epi_archive(tib, other_keys = "code", additional_metadata = list(value = df$value), compactify = FALSE) + ea4 <- as_epi_archive(tib, other_keys = "code", compactify = FALSE) expect_equal(key(ea4$DT), c("geo_value", "time_value", "code", "version")) - expect_equal(ea4$additional_metadata, list(value = df$value)) + expect_null(ea4$additional_metadata) # Keyed data.table kdt <- data.table::data.table( @@ -128,12 +120,12 @@ test_that("epi_archives are correctly instantiated with a variety of data types" ea5 <- as_epi_archive(kdt, compactify = FALSE) # Key from data.table isn't absorbed when as_epi_archive is used expect_equal(key(ea5$DT), c("geo_value", "time_value", "version")) - expect_equal(ea5$additional_metadata, list()) + expect_null(ea5$additional_metadata) - ea6 <- as_epi_archive(kdt, other_keys = "value", additional_metadata = list(value = df$value), compactify = FALSE) + ea6 <- as_epi_archive(kdt, other_keys = "value", compactify = FALSE) # Mismatched keys, but the one from as_epi_archive overrides expect_equal(key(ea6$DT), c("geo_value", "time_value", "value", "version")) - expect_equal(ea6$additional_metadata, list(value = df$value)) + expect_null(ea6$additional_metadata) # Unkeyed data.table udt <- data.table::data.table( @@ -146,11 +138,11 @@ test_that("epi_archives are correctly instantiated with a variety of data types" ea7 <- as_epi_archive(udt, compactify = FALSE) expect_equal(key(ea7$DT), c("geo_value", "time_value", "version")) - expect_equal(ea7$additional_metadata, list()) + expect_null(ea7$additional_metadata) - ea8 <- as_epi_archive(udt, other_keys = "code", additional_metadata = list(value = df$value), compactify = FALSE) + ea8 <- as_epi_archive(udt, other_keys = "code", compactify = FALSE) expect_equal(key(ea8$DT), c("geo_value", "time_value", "code", "version")) - expect_equal(ea8$additional_metadata, list(value = df$value)) + expect_null(ea8$additional_metadata) # epi_df edf1 <- jhu_csse_daily_subset %>% @@ -159,11 +151,11 @@ test_that("epi_archives are correctly instantiated with a variety of data types" ea9 <- as_epi_archive(edf1, compactify = FALSE) expect_equal(key(ea9$DT), c("geo_value", "time_value", "version")) - expect_equal(ea9$additional_metadata, list()) + expect_null(ea9$additional_metadata) - ea10 <- as_epi_archive(edf1, other_keys = "code", additional_metadata = list(value = df$value), compactify = FALSE) + ea10 <- as_epi_archive(edf1, other_keys = "code", compactify = FALSE) expect_equal(key(ea10$DT), c("geo_value", "time_value", "code", "version")) - expect_equal(ea10$additional_metadata, list(value = df$value)) + expect_null(ea10$additional_metadata) # Keyed epi_df edf2 <- data.frame( @@ -176,15 +168,15 @@ test_that("epi_archives are correctly instantiated with a variety of data types" cases = 1:20, misc = "USA" ) %>% - as_epi_df(additional_metadata = list(other_keys = "misc")) + as_epi_df(other_keys = "misc") ea11 <- as_epi_archive(edf2, compactify = FALSE) expect_equal(key(ea11$DT), c("geo_value", "time_value", "version")) - expect_equal(ea11$additional_metadata, list()) + expect_null(ea11$additional_metadata) - ea12 <- as_epi_archive(edf2, other_keys = "misc", additional_metadata = list(value = df$misc), compactify = FALSE) + ea12 <- as_epi_archive(edf2, other_keys = "misc", compactify = FALSE) expect_equal(key(ea12$DT), c("geo_value", "time_value", "misc", "version")) - expect_equal(ea12$additional_metadata, list(value = df$misc)) + expect_null(ea12$additional_metadata) }) test_that("`epi_archive` rejects nonunique keys", { diff --git a/tests/testthat/test-arrange-canonical.R b/tests/testthat/test-arrange-canonical.R index ec42feac..939d2f32 100644 --- a/tests/testthat/test-arrange-canonical.R +++ b/tests/testthat/test-arrange-canonical.R @@ -7,10 +7,12 @@ test_that("canonical arrangement works", { ) expect_error(arrange_canonical(tib)) - tib <- tib %>% as_epi_df(additional_metadata = list(other_keys = "demo_grp")) - expect_equal(names(tib), c("geo_value", "time_value", "x", "demo_grp")) + tib <- tib %>% as_epi_df(other_keys = "demo_grp") + expect_equal(names(tib), c("geo_value", "time_value", "demo_grp", "x")) - tib_sorted <- arrange_canonical(tib) + tib_cols_shuffled <- tib %>% select(geo_value, time_value, x, demo_grp) + + tib_sorted <- arrange_canonical(tib_cols_shuffled) expect_equal(names(tib_sorted), c("geo_value", "time_value", "demo_grp", "x")) expect_equal(tib_sorted$geo_value, rep(c("ca", "ga"), each = 4)) expect_equal(tib_sorted$time_value, c(1, 1, 2, 2, 1, 1, 2, 2)) diff --git a/tests/testthat/test-epi_df.R b/tests/testthat/test-epi_df.R index a49855aa..2444a87a 100644 --- a/tests/testthat/test-epi_df.R +++ b/tests/testthat/test-epi_df.R @@ -23,8 +23,7 @@ test_that("new_epi_df works as intended", { expect_true(lubridate::is.POSIXt(attributes(epi_tib)$metadata$as_of)) }) -test_that("as_epi_df errors when additional_metadata is not a list", { - # This is the 3rd example from as_epi_df +test_that("as_epi_df errors for non-character other_keys", { ex_input <- jhu_csse_county_level_subset %>% dplyr::filter(time_value > "2021-12-01", state_name == "Massachusetts") %>% dplyr::slice_tail(n = 6) %>% @@ -35,9 +34,10 @@ test_that("as_epi_df errors when additional_metadata is not a list", { ) expect_error( - as_epi_df(ex_input, additional_metadata = c(other_keys = "state", "pol")), - "Must be of type 'list', not 'character'." + as_epi_df(ex_input, other_keys = list()), + "Must be of type 'character'" ) + expect_silent(as_epi_df(ex_input, other_keys = c("state", "pol"))) }) test_that("as_epi_df works for nonstandard input", { @@ -81,7 +81,7 @@ tib <- tibble::tibble( time_value = rep(seq(as.Date("2020-01-01"), by = 1, length.out = 5), times = 2), geo_value = rep(c("ca", "hi"), each = 5) ) -epi_tib <- epiprocess::as_epi_df(tib) +epi_tib <- as_epi_df(tib) test_that("grouped epi_df maintains type for select", { grouped_epi <- epi_tib %>% group_by(geo_value) selected_df <- grouped_epi %>% select(-y) @@ -108,9 +108,7 @@ test_that("grouped epi_df handles extra keys correctly", { geo_value = rep(c("ca", "hi"), each = 5), extra_key = rep(seq(as.Date("2020-01-01"), by = 1, length.out = 5), times = 2) ) - epi_tib <- epiprocess::as_epi_df(tib, - additional_metadata = list(other_keys = "extra_key") - ) + epi_tib <- as_epi_df(tib, other_keys = "extra_key") grouped_epi <- epi_tib %>% group_by(geo_value) selected_df <- grouped_epi %>% select(-extra_key) expect_true(inherits(selected_df, "epi_df")) diff --git a/tests/testthat/test-epix_merge.R b/tests/testthat/test-epix_merge.R index c285ad39..5b3de284 100644 --- a/tests/testthat/test-epix_merge.R +++ b/tests/testthat/test-epix_merge.R @@ -177,26 +177,6 @@ test_that("epix_merge forbids and warns on metadata and naming issues", { ), regexp = "overlapping.*names" ) - expect_warning( - epix_merge( - as_epi_archive(tibble::tibble(geo_value = "ak", time_value = test_date, version = test_date + 1L, x_value = 1L), - additional_metadata = list("updates_fetched" = lubridate::ymd_hms("2022-05-01 16:00:00", tz = "UTC")) - ), - as_epi_archive(tibble::tibble(geo_value = "ak", time_value = test_date, version = test_date + 1L, y_value = 2L)) - ), - regexp = "x\\$additional_metadata", - class = "epiprocess__epix_merge_ignores_additional_metadata" - ) - expect_warning( - epix_merge( - as_epi_archive(tibble::tibble(geo_value = "ak", time_value = test_date, version = test_date + 1L, x_value = 1L)), - as_epi_archive(tibble::tibble(geo_value = "ak", time_value = test_date, version = test_date + 1L, y_value = 2L), - additional_metadata = list("updates_fetched" = lubridate::ymd_hms("2022-05-01 16:00:00", tz = "UTC")) - ) - ), - regexp = "y\\$additional_metadata", - class = "epiprocess__epix_merge_ignores_additional_metadata" - ) }) # use `local` to prevent accidentally using the x, y, xy bindings here diff --git a/tests/testthat/test-grouped_epi_archive.R b/tests/testthat/test-grouped_epi_archive.R index 4d1c1468..6ae009ca 100644 --- a/tests/testthat/test-grouped_epi_archive.R +++ b/tests/testthat/test-grouped_epi_archive.R @@ -80,7 +80,7 @@ test_that("Grouping, regrouping, and ungrouping archives works as intended", { time_value = as.Date(time_value) ) %>% # as_epi_df(as_of = as.Date("2000-01-03"), - # additional_metadata = list(other_keys = "age_group")) %>% + # other_keys = "age_group") %>% # # put back in expected order; see issue #166: # select(geo_value, age_group, time_value, s) %>% group_by(geo_value, age_group, .drop = FALSE) diff --git a/tests/testthat/test-methods-epi_df.R b/tests/testthat/test-methods-epi_df.R index e28e23de..bef7f680 100644 --- a/tests/testthat/test-methods-epi_df.R +++ b/tests/testthat/test-methods-epi_df.R @@ -10,8 +10,7 @@ toy_epi_df <- tibble::tibble( indic_var1 = as.factor(rep(1:2, times = 5)), indic_var2 = as.factor(rep(letters[1:5], times = 2)) ) %>% as_epi_df( - additional_metadata = - list(other_keys = c("indic_var1", "indic_var2")) + other_keys = c("indic_var1", "indic_var2") ) att_toy <- attr(toy_epi_df, "metadata") @@ -79,12 +78,12 @@ test_that("Subsetting drops & does not drop the epi_df class appropriately", { expect_identical(att_row_col_subset2$geo_type, att_toy$geo_type) expect_identical(att_row_col_subset2$time_type, att_toy$time_type) expect_identical(att_row_col_subset2$as_of, att_toy$as_of) - expect_identical(att_row_col_subset2$other_keys, character(0)) + expect_identical(att_row_col_subset2$other_keys, att_toy$other_keys[1]) }) test_that("When duplicate cols in subset should abort", { expect_error(toy_epi_df[, c(2, 2:3, 4, 4, 4)], - "Duplicated column names: time_value, y", + "Duplicated column names: time_value, indic_var2", fixed = TRUE ) expect_error(toy_epi_df[1:4, c(1, 2:4, 1)], @@ -95,7 +94,7 @@ test_that("When duplicate cols in subset should abort", { test_that("Correct metadata when subset includes some of other_keys", { # Only include other_var of indic_var1 - only_indic_var1 <- toy_epi_df[, 1:5] + only_indic_var1 <- toy_epi_df[, c(1:3, 5:6)] att_only_indic_var1 <- attr(only_indic_var1, "metadata") expect_true(is_epi_df(only_indic_var1)) @@ -107,7 +106,7 @@ test_that("Correct metadata when subset includes some of other_keys", { expect_identical(att_only_indic_var1$other_keys, att_toy$other_keys[-2]) # Only include other_var of indic_var2 - only_indic_var2 <- toy_epi_df[, c(1:4, 6)] + only_indic_var2 <- toy_epi_df[, c(1:2, 4:6)] att_only_indic_var2 <- attr(only_indic_var2, "metadata") expect_true(is_epi_df(only_indic_var2)) @@ -142,7 +141,7 @@ test_that("Grouping are dropped by `as_tibble`", { test_that("Renaming columns gives appropriate colnames and metadata", { edf <- tibble::tibble(geo_value = "ak", time_value = as.Date("2020-01-01"), age = 1, value = 1) %>% - as_epi_df(additional_metadata = list(other_keys = "age")) + as_epi_df(other_keys = "age") # renaming using base R renamed_edf1 <- edf %>% `[`(c("geo_value", "time_value", "age", "value")) %>% @@ -151,14 +150,14 @@ test_that("Renaming columns gives appropriate colnames and metadata", { expect_identical(attr(renamed_edf1, "metadata")$other_keys, c("age_group")) # renaming using select renamed_edf2 <- edf %>% - as_epi_df(additional_metadata = list(other_keys = "age")) %>% + as_epi_df(other_keys = "age") %>% select(geo_value, time_value, age_group = age, value) expect_identical(renamed_edf1, renamed_edf2) }) test_that("Renaming columns while grouped gives appropriate colnames and metadata", { gedf <- tibble::tibble(geo_value = "ak", time_value = as.Date("2020-01-01"), age = 1, value = 1) %>% - as_epi_df(additional_metadata = list(other_keys = "age")) %>% + as_epi_df(other_keys = "age") %>% group_by(geo_value) # renaming using base R renamed_gedf1 <- gedf %>% @@ -178,7 +177,7 @@ test_that("Renaming columns while grouped gives appropriate colnames and metadat test_that("Additional `select` on `epi_df` tests", { edf <- tibble::tibble(geo_value = "ak", time_value = as.Date("2020-01-01"), age = 1, value = 1) %>% - as_epi_df(additional_metadata = list(other_keys = "age")) + as_epi_df(other_keys = "age") # Dropping a non-geo_value epikey column doesn't decay, though maybe it # should, since you'd expect that to possibly result in multiple rows per diff --git a/tests/testthat/test-utils.R b/tests/testthat/test-utils.R index 12e7a3f7..d18f9f48 100644 --- a/tests/testthat/test-utils.R +++ b/tests/testthat/test-utils.R @@ -251,8 +251,8 @@ test_that("guess_period works", { weekly_posixcts ) # On POSIXlts: - daily_posixlts <- as.POSIXlt(daily_dates, tz = "US/Aleutian") + 3600 - weekly_posixlts <- as.POSIXlt(weekly_dates, tz = "US/Aleutian") + 3600 + daily_posixlts <- as.POSIXlt(daily_dates, tz = "UTC") + 3600 + weekly_posixlts <- as.POSIXlt(weekly_dates, tz = "UTC") + 3600 expect_identical( daily_posixlts[[1L]] + guess_period(daily_posixlts) * (seq_along(daily_posixlts) - 1L), daily_posixlts diff --git a/vignettes/archive.Rmd b/vignettes/archive.Rmd index 3d616d92..c5cc154b 100644 --- a/vignettes/archive.Rmd +++ b/vignettes/archive.Rmd @@ -119,7 +119,6 @@ The following pieces of metadata are included as fields in an `epi_archive` object: * `geo_type`: the type for the geo values. -* `additional_metadata`: list of additional metadata for the data archive. Metadata for an `epi_archive` object `x` can be accessed (and altered) directly, as in `x$geo_type`, etc. Just like `as_epi_df()`, the function diff --git a/vignettes/epiprocess.Rmd b/vignettes/epiprocess.Rmd index 24a98505..e6c78aba 100644 --- a/vignettes/epiprocess.Rmd +++ b/vignettes/epiprocess.Rmd @@ -234,7 +234,7 @@ ex2 <- ex2 %>% rename(geo_value = state, time_value = reported_date) %>% as_epi_df( as_of = "2020-06-03", - additional_metadata = list(other_keys = "pol") + other_keys = "pol" ) attr(ex2, "metadata") @@ -264,12 +264,12 @@ ex3 <- ex3 %>% state = rep(tolower("MA"), 6), pol = rep(c("blue", "swing", "swing"), each = 2) ) %>% - as_epi_df(additional_metadata = list(other_keys = c("state", "pol")), as_of = as.Date("2024-03-20")) + as_epi_df(other_keys = c("state", "pol"), as_of = as.Date("2024-03-20")) attr(ex3, "metadata") ``` -Note that the two additional keys we added, `state` and `pol`, are specified as a character vector in the `other_keys` component of the `additional_metadata` list. They must be specified in this manner so that downstream actions on the `epi_df`, like model fitting and prediction, can recognize and use these keys. +Note that the two additional keys we added, `state` and `pol`, are specified as a character vector in the `other_keys` argument. They must be specified in this manner so that downstream actions on the `epi_df`, like model fitting and prediction, can recognize and use these keys. Currently `other_keys` metadata in `epi_df` doesn't impact `epi_slide()`, contrary to `other_keys` in `as_epi_archive` which affects how the update data is interpreted.