Skip to content

Pass ref_time_value to epix_slide for functions and formulas #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9bda33c
pass ref_time_value to function f in epix_slide
nmdefries Apr 26, 2023
976155e
extend rlang::as_function so can reference ref_time_value in formulas
nmdefries Apr 28, 2023
1aad359
remove slide func arg name requirement
nmdefries May 4, 2023
5966f00
test comp func conversion and epix_slide takes reftimeval
nmdefries May 5, 2023
d2823b4
epx_slide fs in tests to take 3 args
nmdefries May 5, 2023
3a299c5
rename f args in epix_slide tests and examples
nmdefries May 5, 2023
43ce902
Merge branch 'dev' into ndefries/epix-slide-pass-reftimevalue-without…
nmdefries May 19, 2023
65a49da
documentation improvements and clarifications
nmdefries May 30, 2023
f93ef93
rename helper text and fn class in `as_slide_computation`
nmdefries Jun 1, 2023
964e46d
remove quosure conversion in `as_slide_computation`
nmdefries Jun 1, 2023
e709ea2
use Abort in errors
nmdefries Jun 1, 2023
d44b41f
test errors from as_slide_computation
nmdefries Jun 1, 2023
a682557
note min arg requirement and ref_time_value access in NEWS
nmdefries Jun 2, 2023
920fbd7
Merge branch 'dev' into ndefries/epix-slide-pass-reftimevalue-without…
nmdefries Jun 2, 2023
093f95a
don't export as_slide_computation
nmdefries Jun 2, 2023
0283d6a
suppress manual page generation
nmdefries Jun 2, 2023
dcd0515
pass third arg to func interface tests
nmdefries Jun 2, 2023
eb0c27e
some tests expect a tibble
nmdefries Jun 2, 2023
143081a
document that ref_time_val not always a date
nmdefries Jun 8, 2023
91d270a
use stronger attribution
nmdefries Jun 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,22 @@ importFrom(rlang,"!!")
importFrom(rlang,.data)
importFrom(rlang,.env)
importFrom(rlang,arg_match)
importFrom(rlang,caller_arg)
importFrom(rlang,caller_env)
importFrom(rlang,check_dots_empty0)
importFrom(rlang,enquo)
importFrom(rlang,enquos)
importFrom(rlang,f_env)
importFrom(rlang,f_rhs)
importFrom(rlang,global_env)
importFrom(rlang,is_environment)
importFrom(rlang,is_formula)
importFrom(rlang,is_function)
importFrom(rlang,is_missing)
importFrom(rlang,is_quosure)
importFrom(rlang,is_string)
importFrom(rlang,missing_arg)
importFrom(rlang,new_function)
importFrom(rlang,quo_is_missing)
importFrom(rlang,sym)
importFrom(rlang,syms)
Expand Down
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ inter-release development versions will include an additional ".9999" suffix.

## Breaking changes:

* Changes to `epix_slide`:
* The `f` computation is now required to take at least three arguments. `f`
must take an `epi_df` with the same column names as the archive's `DT`,
minus the `version` column; followed by a one-row tibble containing the
values of the grouping variables for the associated group; followed by a
reference time value, usually as a `Date` object; followed by any number
of named arguments.

## New features:

* `epix_slide` has been made more like `dplyr::group_modify`. It will no longer
perform element/row recycling for size stability, accepts slide computation
outputs containing any number of rows, and no longer supports `all_rows`.
Expand All @@ -19,6 +29,11 @@ inter-release development versions will include an additional ".9999" suffix.
more closely whether/when/how to output an `epi_df`.
* To keep the old behavior, convert the output of `epix_slide()` to `epi_df`
when desired and set the metadata appropriately.
* `epix_slide` `f` computations passed as functions or formulas now have
access to the reference time value. If `f` is a function, it is passed a
Date containing the reference time value as the third argument. If a
formula, `f` can access the reference time value via `.z` or
`.ref_time_value`.

## Improvements:

Expand Down
6 changes: 3 additions & 3 deletions R/grouped_epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ grouped_epi_archive =

# Check that `f` takes enough args
if (!missing(f) && is.function(f)) {
assert_sufficient_f_args(f, ...)
assert_sufficient_f_args(f, ..., n_mandatory_f_args = 3L)
}

# Validate and pre-process `before`:
Expand Down Expand Up @@ -272,7 +272,7 @@ grouped_epi_archive =
ref_time_value,
new_col) {
# Carry out the specified computation
comp_value = f(.data_group, .group_key, ...)
comp_value = f(.data_group, .group_key, ref_time_value, ...)

if (all_versions) {
# Extract data from archive so we can do length checks below. When
Expand All @@ -298,7 +298,7 @@ grouped_epi_archive =

# If f is not missing, then just go ahead, slide by group
if (!missing(f)) {
if (rlang::is_formula(f)) f = rlang::as_function(f)
if (rlang::is_formula(f)) f = as_slide_computation(f)
x = purrr::map_dfr(ref_time_values, function(ref_time_value) {
# Ungrouped as-of data; `epi_df` if `all_versions` is `FALSE`,
# `epi_archive` if `all_versions` is `TRUE`:
Expand Down
23 changes: 13 additions & 10 deletions R/methods-epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -665,14 +665,17 @@ group_by.epi_archive = function(.data, ..., .add=FALSE, .drop=dplyr::group_by_dr
#' sliding (a.k.a. "rolling") time window for each data group. The window is
#' determined by the `before` parameter described below. One time step is
#' typically one day or one week; see [`epi_slide`] details for more
#' explanation. If a function, `f` must take `x`, an `epi_df` with the same
#' column names as the archive's `DT`, minus the `version` column; followed by
#' `g`, a one-row tibble containing the values of the grouping variables for
#' the associated group; followed by any number of named arguments. If a
#' formula, `f` can operate directly on columns accessed via `.x$var`, as in
#' `~ mean(.x$var)` to compute a mean of a column `var` for each
#' `ref_time_value`-group combination. If `f` is missing, then `...` will
#' specify the computation.
#' explanation. If a function, `f` must take an `epi_df` with the same
#' column names as the archive's `DT`, minus the `version` column; followed
#' by a one-row tibble containing the values of the grouping variables for
#' the associated group; followed by a reference time value, usually as a
#' `Date` object; followed by any number of named arguments. If a formula,
#' `f` can operate directly on columns accessed via `.x$var` or `.$var`, as
#' in `~ mean (.x$var)` to compute a mean of a column `var` for each
#' group-`ref_time_value` combination. The group key can be accessed via
#' `.y` or `.group_key`, and the reference time value can be accessed via
#' `.z` or `.ref_time_value`. If `f` is missing, then `...` will specify the
#' computation.
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then `...` is interpreted as an
#' expression for tidy evaluation. See details of [`epi_slide`].
Expand Down Expand Up @@ -827,7 +830,7 @@ group_by.epi_archive = function(.data, ..., .add=FALSE, .drop=dplyr::group_by_dr
#' archive_cases_dv_subset %>%
#' group_by(geo_value) %>%
#' epix_slide(
#' function(x, g) {
#' function(x, gk, rtv) {
#' tibble(
#' time_range = if(nrow(x) == 0L) {
#' "0 `time_value`s"
Expand Down Expand Up @@ -855,7 +858,7 @@ group_by.epi_archive = function(.data, ..., .add=FALSE, .drop=dplyr::group_by_dr
#' archive_cases_dv_subset %>%
#' group_by(geo_value) %>%
#' epix_slide(
#' function(x, g) {
#' function(x, gk, rtv) {
#' tibble(
#' versions_start = if (nrow(x$DT) == 0L) {
#' "NA (0 rows)"
Expand Down
12 changes: 6 additions & 6 deletions R/slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@
#' sliding (a.k.a. "rolling") time window for each data group. The window is
#' determined by the `before` and `after` parameters described below. One time
#' step is typically one day or one week; see details for more explanation. If
#' a function, `f` must take `x`, a data frame with the same column names as
#' a function, `f` must take a data frame with the same column names as
#' the original object, minus any grouping variables, containing the time
#' window data for one `ref_time_value`-group combination; followed by `g`, a
#' window data for one group-`ref_time_value` combination; followed by a
#' one-row tibble containing the values of the grouping variables for the
#' associated group; followed by any number of named arguments. If a formula,
#' `f` can operate directly on columns accessed via `.x$var`, as in `~
#' mean(.x$var)` to compute a mean of a column `var` for each
#' `ref_time_value`-group combination. If `f` is missing, then `...` will
#' specify the computation.
#' `f` can operate directly on columns accessed via `.x$var` or `.$var`, as
#' in `~mean(.x$var)` to compute a mean of a column `var` for each
#' `ref_time_value`-group combination. The group key can be accessed via `.y`.
#' If `f` is missing, then `...` will specify the computation.
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then the `...` is interpreted as
#' an expression for tidy evaluation. See details.
Expand Down
111 changes: 108 additions & 3 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -103,17 +103,19 @@ Warn = function(msg, ...) rlang::warn(break_str(msg, init = "Warning: "), ...)
#' Assert that a sliding computation function takes enough args
#'
#' @param f Function; specifies a computation to slide over an `epi_df` or
#' `epi_archive` in `epi_slide` or `epix_slide`.
#' `epi_archive` in `epi_slide` or `epix_slide`.
#' @param ... Dots that will be forwarded to `f` from the dots of `epi_slide` or
#' `epix_slide`.
#' @param n_mandatory_f_args Integer; specifies the number of arguments `f`
#' is required to take before any `...` arg. Defaults to 2.
#'
#' @importFrom rlang is_missing
#' @importFrom purrr map_lgl
#' @importFrom utils tail
#'
#' @noRd
assert_sufficient_f_args <- function(f, ...) {
mandatory_f_args_labels <- c("window data", "group key")
assert_sufficient_f_args <- function(f, ..., n_mandatory_f_args = 2L) {
mandatory_f_args_labels <- c("window data", "group key", "reference time value")[seq(n_mandatory_f_args)]
n_mandatory_f_args <- length(mandatory_f_args_labels)
args = formals(args(f))
args_names = names(args)
Expand Down Expand Up @@ -181,6 +183,109 @@ assert_sufficient_f_args <- function(f, ...) {
}
}

#' Convert to function
#'
#' @description
#' `as_slide_computation()` transforms a one-sided formula into a function.
#' This powers the lambda syntax in packages like purrr.
#'
#' This code and documentation borrows heavily from [`rlang::as_function`]
#' (https://github.com/r-lib/rlang/blob/c55f6027928d3104ed449e591e8a225fcaf55e13/R/fn.R#L343-L427).
#'
#' This code extends `rlang::as_function` to create functions that take three
#' arguments. The arguments can be accessed via the idiomatic `.x`, `.y`,
#' etc, positional references (`..1`, `..2`, etc), and also by `epi
#' [x]_slide`-specific names.
#'
#' @source https://github.com/r-lib/rlang/blob/c55f6027928d3104ed449e591e8a225fcaf55e13/R/fn.R#L343-L427
#'
#' @param x A function or formula.
#'
#' If a **function**, it is used as is.
#'
#' If a **formula**, e.g. `~ mean(.x$cases)`, it is converted to a function with up
#' to three arguments: `.x` (single argument), or `.x` and `.y`
#' (two arguments), or `.x`, `.y`, and `.z` (three arguments). The `.`
#' placeholder can be used instead of `.x`, `.group_key` can be used in
#' place of `.y`, and `.ref_time_value` can be used in place of `.z`. This
#' allows you to create very compact anonymous functions (lambdas) with up
#' to three inputs. Functions created from formulas have a special class. Use
#' `rlang::is_lambda()` to test for it.
#'
#' If a **string**, the function is looked up in `env`. Note that
#' this interface is strictly for user convenience because of the
#' scoping issues involved. Package developers should avoid
#' supplying functions by name and instead supply them by value.
#'
#' @param env Environment in which to fetch the function in case `x`
#' is a string.
#' @inheritParams rlang::args_dots_empty
#' @inheritParams rlang::args_error_context
#' @examples
#' f <- as_slide_computation(~ .x + 1)
#' f(10)
#'
#' g <- as_slide_computation(~ -1 * .)
#' g(4)
#'
#' h <- as_slide_computation(~ .x - .group_key)
#' h(6, 3)
#'
#' @importFrom rlang check_dots_empty0 is_function new_function f_env
#' is_environment missing_arg f_rhs is_string is_formula caller_arg
#' caller_env global_env
#'
#' @noRd
as_slide_computation <- function(x,
env = global_env(),
...,
arg = caller_arg(x),
call = caller_env()) {
check_dots_empty0(...)

if (is_function(x)) {
return(x)
}

if (is_formula(x)) {
if (length(x) > 2) {
Abort(sprintf("%s must be a one-sided formula", arg),
class = "epiprocess__as_slide_computation__formula_is_twosided",
epiprocess__x = x,
call = call)
}

env <- f_env(x)
if (!is_environment(env)) {
Abort("Formula must carry an environment.",
class = "epiprocess__as_slide_computation__formula_has_no_env",
epiprocess__x = x,
epiprocess__x_env = env,
arg = arg, call = call)
}

args <- list(
... = missing_arg(),
.x = quote(..1), .y = quote(..2), .z = quote(..3),
. = quote(..1), .group_key = quote(..2), .ref_time_value = quote(..3)
)
fn <- new_function(args, f_rhs(x), env)
fn <- structure(fn, class = c("epiprocess_slide_computation", "function"))
return(fn)
}

if (is_string(x)) {
return(get(x, envir = env, mode = "function"))
}

Abort(sprintf("Can't convert a %s to a slide computation", class(x)),
class = "epiprocess__as_slide_computation__cant_convert_catchall",
epiprocess__x = x,
epiprocess__x_class = class(x),
arg = arg,
call = call)
}

##########

in_range = function(x, rng) pmin(pmax(x, rng[1]), rng[2])
Expand Down
11 changes: 6 additions & 5 deletions man/epi_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 13 additions & 10 deletions man/epix_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading