Skip to content

Commit 57cda93

Browse files
dajmcdondshemetovbrookslogan
authored
feat: other_keys as arg in epi_df, epi_archive (#512)
* change epi_df and epi_archive constructors * remove additional_metadata from both epi_df and epi_archive * adjust printing for both methods * fix unknown TZ warning * fix vignettes, docs * clean up DESCRIPTION * no need to collate without R6 * fix incomplete merge * chore: regenerate data Co-authored-by: Dmitry Shemetov <[email protected]> Co-authored-by: dajmcdon <[email protected]> Co-authored-by: brookslogan <[email protected]>
1 parent 9c4c49d commit 57cda93

26 files changed

+190
-224
lines changed

DESCRIPTION

+19-15
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Title: Tools for basic signal processing in epidemiology
44
Version: 0.8.5
55
Authors@R: c(
66
person("Jacob", "Bien", role = "ctb"),
7-
person("Logan", "Brooks", email = "[email protected]", role = c("aut", "cre")),
7+
person("Logan", "Brooks", , "[email protected]", role = c("aut", "cre")),
88
person("Rafael", "Catoia", role = "ctb"),
99
person("Nat", "DeFries", role = "ctb"),
1010
person("Daniel", "McDonald", role = "aut"),
@@ -15,16 +15,22 @@ Authors@R: c(
1515
person("Evan", "Ray", role = "aut"),
1616
person("Dmitry", "Shemetov", role = "ctb"),
1717
person("Ryan", "Tibshirani", role = "aut"),
18-
person("Lionel", "Henry", role = "ctb", comment = "Author of included rlang fragments"),
19-
person("Hadley", "Wickham", role = "ctb", comment = "Author of included rlang fragments"),
20-
person("Posit", role = "cph", comment = "Copyright holder of included rlang fragments")
18+
person("Lionel", "Henry", role = "ctb",
19+
comment = "Author of included rlang fragments"),
20+
person("Hadley", "Wickham", role = "ctb",
21+
comment = "Author of included rlang fragments"),
22+
person("Posit", role = "cph",
23+
comment = "Copyright holder of included rlang fragments")
2124
)
22-
Description: This package introduces a common data structure for epidemiological
23-
data reported by location and time, provides another data structure to
24-
work with revisions to these data sets over time, and offers associated
25-
utilities to perform basic signal processing tasks.
25+
Description: This package introduces a common data structure for
26+
epidemiological data reported by location and time, provides another
27+
data structure to work with revisions to these data sets over time,
28+
and offers associated utilities to perform basic signal processing
29+
tasks.
2630
License: MIT + file LICENSE
27-
Copyright: file inst/COPYRIGHTS
31+
URL: https://cmu-delphi.github.io/epiprocess/
32+
Depends:
33+
R (>= 3.6)
2834
Imports:
2935
checkmate,
3036
cli,
@@ -58,18 +64,16 @@ VignetteBuilder:
5864
knitr
5965
Remotes:
6066
cmu-delphi/epidatr,
61-
reconverse/outbreaks,
62-
glmgen/genlasso
67+
glmgen/genlasso,
68+
reconverse/outbreaks
6369
Config/testthat/edition: 3
6470
Config/testthat/parallel: true
71+
Copyright: file inst/COPYRIGHTS
6572
Encoding: UTF-8
6673
LazyData: true
6774
Roxygen: list(markdown = TRUE)
6875
RoxygenNote: 7.3.2
69-
Depends:
70-
R (>= 2.10)
71-
URL: https://cmu-delphi.github.io/epiprocess/
72-
Collate:
76+
Collate:
7377
'archive.R'
7478
'autoplot.R'
7579
'correlation.R'

NEWS.md

+11-4
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
55
# epiprocess 0.9
66

77
## Breaking changes
8-
- In `epi[x]_slide`:
8+
9+
- In `epi[x]_slide`
910
- `names_sep` is deprecated, and if you return data frames from your
1011
computations, they will no longer be unpacked into separate columns with
1112
name prefixes; instead:
@@ -15,12 +16,18 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
1516
packed data.frame-class column (see `tidyr::pack`).
1617
- `as_list_col` is deprecated; you can now directly return a list from your
1718
slide computations instead.
19+
- `additional_metadata` is no longer accepted in `as_epi_df()` or
20+
`as_epi_archive()`. Use the new `other_keys` arg to specify additional key
21+
columns, such as age group columns or other demographic breakdowns.
22+
Miscellaneous metadata are no longer handled by `epiprocess`, but you can use
23+
R's built-in `attr<-` instead for a similar feature.
1824

1925
## Improvements
2026

2127
- Added `complete.epi_df`, which fills in missing values in an `epi_df` with
2228
`NA`s. Uses `tidyr::complete` underneath and preserves `epi_df` metadata.
23-
- Inclusion of the function `revision_summary` to provide basic revision information for `epi_archive`s out of the box. (#492)
29+
- Inclusion of the function `revision_summary` to provide basic revision
30+
information for `epi_archive`s out of the box. (#492)
2431

2532
## Bug fixes
2633

@@ -87,8 +94,8 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
8794
- Multiple "data-masking" tidy evaluation expressions can be passed in via
8895
`...`, rather than just one.
8996
- Additional tidy evaluation features from `dplyr::mutate` are supported: `!!
90-
name_var := value`, unnamed expressions evaluating to data frames, and `=
91-
NULL`; see `?epi_slide` for more details.
97+
name_var := value`, unnamed expressions evaluating to data frames, and `=
98+
NULL`; see `?epi_slide` for more details.
9299

93100
## Cleanup
94101

R/archive.R

+11-18
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,8 @@ NULL
179179
#'
180180
#' * `geo_type`: the type for the geo values.
181181
#' * `time_type`: the type for the time values.
182-
#' * `additional_metadata`: list of additional metadata for the data archive.
182+
#' * `other_keys`: any additional keys as a character vector.
183+
#' Typical examples are "age" or sub-geographies.
183184
#'
184185
#' While this metadata is not protected, it is generally recommended to treat it
185186
#' as read-only, and to use the `epi_archive` methods to interact with the data
@@ -209,10 +210,8 @@ NULL
209210
#' if the time type is not recognized.
210211
#' @param other_keys Character vector specifying the names of variables in `x`
211212
#' that should be considered key variables (in the language of `data.table`)
212-
#' apart from "geo_value", "time_value", and "version".
213-
#' @param additional_metadata List of additional metadata to attach to the
214-
#' `epi_archive` object. The metadata will have the `geo_type` field; named
215-
#' entries from the passed list or will be included as well.
213+
#' apart from "geo_value", "time_value", and "version". Typical examples
214+
#' are "age" or more granular geographies.
216215
#' @param compactify Optional; Boolean. `TRUE` will remove some
217216
#' redundant rows, `FALSE` will not, and missing or `NULL` will remove
218217
#' redundant rows, but issue a warning. See more information at `compactify`.
@@ -293,7 +292,6 @@ new_epi_archive <- function(
293292
geo_type,
294293
time_type,
295294
other_keys,
296-
additional_metadata,
297295
compactify,
298296
clobberable_versions_start,
299297
versions_end,
@@ -350,7 +348,7 @@ new_epi_archive <- function(
350348
DT = compactified,
351349
geo_type = geo_type,
352350
time_type = time_type,
353-
additional_metadata = additional_metadata,
351+
other_keys = other_keys,
354352
clobberable_versions_start = clobberable_versions_start,
355353
versions_end = versions_end
356354
),
@@ -423,7 +421,6 @@ is_locf <- function(vec, tolerance) { # nolint: object_usage_linter
423421
validate_epi_archive <- function(
424422
x,
425423
other_keys,
426-
additional_metadata,
427424
compactify,
428425
clobberable_versions_start,
429426
versions_end) {
@@ -434,9 +431,6 @@ validate_epi_archive <- function(
434431
if (any(c("geo_value", "time_value", "version") %in% other_keys)) {
435432
cli_abort("`other_keys` cannot contain \"geo_value\", \"time_value\", or \"version\".")
436433
}
437-
if (any(names(additional_metadata) %in% c("geo_type", "time_type"))) {
438-
cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\" or \"time_type\".")
439-
}
440434

441435
# Conduct checks and apply defaults for `compactify`
442436
assert_logical(compactify, len = 1, any.missing = FALSE, null.ok = TRUE)
@@ -485,8 +479,7 @@ as_epi_archive <- function(
485479
x,
486480
geo_type = deprecated(),
487481
time_type = deprecated(),
488-
other_keys = character(0L),
489-
additional_metadata = list(),
482+
other_keys = character(),
490483
compactify = NULL,
491484
clobberable_versions_start = NA,
492485
.versions_end = max_version_with_row_in(x), ...,
@@ -518,11 +511,10 @@ as_epi_archive <- function(
518511
time_type <- guess_time_type(x$time_value)
519512

520513
validate_epi_archive(
521-
x, other_keys, additional_metadata,
522-
compactify, clobberable_versions_start, versions_end
514+
x, other_keys, compactify, clobberable_versions_start, versions_end
523515
)
524516
new_epi_archive(
525-
x, geo_type, time_type, other_keys, additional_metadata,
517+
x, geo_type, time_type, other_keys,
526518
compactify, clobberable_versions_start, versions_end
527519
)
528520
}
@@ -551,7 +543,7 @@ print.epi_archive <- function(x, ..., class = TRUE, methods = TRUE) {
551543
c(
552544
">" = if (class) "An `epi_archive` object, with metadata:",
553545
"i" = if (length(setdiff(key(x$DT), c("geo_value", "time_value", "version"))) > 0) {
554-
"Non-standard DT keys: {setdiff(key(x$DT), c('geo_value', 'time_value', 'version'))}"
546+
"Other DT keys: {setdiff(key(x$DT), c('geo_value', 'time_value', 'version'))}"
555547
},
556548
"i" = if (nrow(x$DT) != 0L) {
557549
"Min/max time values: {min(x$DT$time_value)} / {max(x$DT$time_value)}"
@@ -687,7 +679,8 @@ print.epi_archive <- function(x, ..., class = TRUE, methods = TRUE) {
687679
#' @export
688680
#'
689681
#' @aliases grouped_epi_archive
690-
group_by.epi_archive <- function(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) {
682+
group_by.epi_archive <- function(.data, ..., .add = FALSE,
683+
.drop = dplyr::group_by_drop_default(.data)) {
691684
# `add` makes no difference; this is an ungrouped `epi_archive`.
692685
detailed_mutate <- epix_detailed_restricted_mutate(.data, ...)
693686
assert_logical(.drop)

R/epi_df.R

+38-37
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@
127127
#' dplyr::rename(geo_value = state, time_value = reported_date) %>%
128128
#' as_epi_df(
129129
#' as_of = "2020-06-03",
130-
#' additional_metadata = list(other_keys = "pol")
130+
#' other_keys = "pol"
131131
#' )
132132
#'
133133
#' attr(ex2, "metadata")
@@ -146,47 +146,46 @@
146146
#' state = rep("MA", 6),
147147
#' pol = rep(c("blue", "swing", "swing"), each = 2)
148148
#' ) %>%
149-
#' # the 2 extra keys we added have to be specified in the other_keys
150-
#' # component of additional_metadata.
151-
#' as_epi_df(additional_metadata = list(other_keys = c("state", "pol")))
149+
#' as_epi_df(other_keys = c("state", "pol"))
152150
#'
153151
#' attr(ex3, "metadata")
154152
NULL
155153

156-
#' Create an `epi_df` object
157-
#'
158-
#' @rdname epi_df
159-
#' @param geo_type DEPRECATED Has no effect. Geo value type is inferred from the
160-
#' location column and set to "custom" if not recognized.
161-
#' @param time_type DEPRECATED Has no effect. Time value type inferred from the time
162-
#' column and set to "custom" if not recognized. Unpredictable behavior may result
163-
#' if the time type is not recognized.
154+
#' @describeIn epi_df Lower-level constructor for `epi_df` object
155+
#' @order 2
156+
#' @param geo_type `r lifecycle::badge("deprecated")` in `as_epi_df()`, has no
157+
#' effect; the geo value type is inferred from the location column and set to
158+
#' "custom" if not recognized. In `new_epi_df()`, should be set to the same
159+
#' value that would be inferred.
160+
#' @param time_type `r lifecycle::badge("deprecated")` in `as_epi_df()`, has no
161+
#' effect: the time value type inferred from the time column and set to
162+
#' "custom" if not recognized. Unpredictable behavior may result if the time
163+
#' type is not recognized. In `new_epi_df()`, should be set to the same value
164+
#' that would be inferred.
164165
#' @param as_of Time value representing the time at which the given data were
165166
#' available. For example, if `as_of` is January 31, 2022, then the `epi_df`
166167
#' object that is created would represent the most up-to-date version of the
167168
#' data available as of January 31, 2022. If the `as_of` argument is missing,
168169
#' then the current day-time will be used.
169-
#' @param additional_metadata List of additional metadata to attach to the
170-
#' `epi_df` object. The metadata will have `geo_type`, `time_type`, and
171-
#' `as_of` fields; named entries from the passed list will be included as
172-
#' well. If your tibble has additional keys, be sure to specify them as a
173-
#' character vector in the `other_keys` component of `additional_metadata`.
170+
#' @param other_keys If your tibble has additional keys, be sure to specify them
171+
#' as a character vector here (typical examples are "age" or sub-geographies).
174172
#' @param ... Additional arguments passed to methods.
175173
#' @return An `epi_df` object.
176174
#'
177175
#' @export
178-
new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
179-
additional_metadata = list()) {
176+
new_epi_df <- function(x = tibble::tibble(geo_value = character(), time_value = as.Date(integer())),
177+
geo_type, time_type, as_of,
178+
other_keys = character(), ...) {
180179
# Define metadata fields
181180
metadata <- list()
182181
metadata$geo_type <- geo_type
183182
metadata$time_type <- time_type
184183
metadata$as_of <- as_of
185-
metadata <- c(metadata, additional_metadata)
184+
metadata$other_keys <- other_keys
186185

187186
# Reorder columns (geo_value, time_value, ...)
188187
if (sum(dim(x)) != 0) {
189-
cols_to_put_first <- c("geo_value", "time_value")
188+
cols_to_put_first <- c("geo_value", "time_value", other_keys)
190189
x <- x[, c(
191190
cols_to_put_first,
192191
# All other columns
@@ -200,7 +199,8 @@ new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
200199
return(x)
201200
}
202201

203-
#' @rdname epi_df
202+
#' @describeIn epi_df The preferred way of constructing `epi_df`s
203+
#' @order 1
204204
#' @param x An `epi_df`, `data.frame`, [tibble::tibble], or [tsibble::tsibble]
205205
#' to be converted
206206
#' @param ... used for specifying column names, as in [`dplyr::rename`]. For
@@ -211,24 +211,26 @@ as_epi_df <- function(x, ...) {
211211
}
212212

213213
#' @rdname epi_df
214+
#' @order 1
214215
#' @method as_epi_df epi_df
215216
#' @export
216217
as_epi_df.epi_df <- function(x, ...) {
217218
return(x)
218219
}
219220

220221
#' @rdname epi_df
221-
#' @method as_epi_df tbl_df
222+
#' @order 1
222223
#' @importFrom rlang .data
223224
#' @importFrom tidyselect any_of
224225
#' @importFrom cli cli_inform
226+
#' @method as_epi_df tbl_df
225227
#' @export
226228
as_epi_df.tbl_df <- function(
227229
x,
228230
geo_type = deprecated(),
229231
time_type = deprecated(),
230232
as_of,
231-
additional_metadata = list(),
233+
other_keys = character(),
232234
...) {
233235
# possible standard substitutions for time_value
234236
x <- rename(x, ...)
@@ -274,29 +276,28 @@ as_epi_df.tbl_df <- function(
274276
} # Use the current day-time
275277
}
276278

277-
assert_list(additional_metadata)
278-
additional_metadata[["other_keys"]] <- additional_metadata[["other_keys"]] %||% character(0L)
279-
new_epi_df(x, geo_type, time_type, as_of, additional_metadata)
279+
assert_character(other_keys)
280+
new_epi_df(x, geo_type, time_type, as_of, other_keys)
280281
}
281282

282-
#' @method as_epi_df data.frame
283283
#' @rdname epi_df
284+
#' @order 1
285+
#' @method as_epi_df data.frame
284286
#' @export
285-
as_epi_df.data.frame <- function(x, as_of, additional_metadata = list(), ...) {
286-
as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, additional_metadata = additional_metadata, ...)
287+
as_epi_df.data.frame <- function(x, as_of, other_keys = character(), ...) {
288+
as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, other_keys = other_keys, ...)
287289
}
288290

289-
#' @method as_epi_df tbl_ts
290291
#' @rdname epi_df
292+
#' @order 1
293+
#' @method as_epi_df tbl_ts
291294
#' @export
292-
as_epi_df.tbl_ts <- function(x, as_of, additional_metadata = list(), ...) {
295+
as_epi_df.tbl_ts <- function(x, as_of, other_keys = character(), ...) {
293296
tsibble_other_keys <- setdiff(tsibble::key_vars(x), "geo_value")
294-
if (length(tsibble_other_keys) != 0) {
295-
additional_metadata$other_keys <- unique(
296-
c(additional_metadata$other_keys, tsibble_other_keys)
297-
)
297+
if (length(tsibble_other_keys) > 0) {
298+
other_keys <- unique(c(other_keys, tsibble_other_keys))
298299
}
299-
as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, additional_metadata = additional_metadata, ...)
300+
as_epi_df.tbl_df(x = tibble::as_tibble(x), as_of = as_of, other_keys = other_keys, ...)
300301
}
301302

302303
#' Test for `epi_df` format

0 commit comments

Comments
 (0)