Skip to content

Draft and discuss naming schemes for epix_slide parameters, output #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brookslogan opened this issue Jul 22, 2022 · 4 comments
Open
Labels
op-semantics Operational semantics; many potentially breaking changes here P3 very low priority

Comments

@brookslogan
Copy link
Contributor

See #146 (comment); time_value vs version bullet point.

@brookslogan brookslogan added P1 medium priority op-semantics Operational semantics; many potentially breaking changes here labels Jul 22, 2022
@brookslogan
Copy link
Contributor Author

brookslogan commented Jul 22, 2022

"Discuss" will eventually mean running sketches of use by some potential users / other developers (e.g., Evan, Jacob).

@brookslogan brookslogan added P3 very low priority and removed P1 medium priority labels Jul 24, 2022
@brookslogan
Copy link
Contributor Author

As discussion on #170 and #171 has brought up, another naming option for the relevant output column(s), besides time_value, version, and both, is ref_time_value.

@brookslogan
Copy link
Contributor Author

brookslogan commented Aug 24, 2022

Some remaining discussion points from #146:

  • Separate discussion: should we rename ref_time_values and time_value output column to something involving version, or keep the former and have the latter turn into two duplicate columns with both names? Should we output an epi_archive?

[or should we call the time/version output column ref_time_value?]

  • Separate discussion: should we rename max_version parameter of epix_slide to version?

[Since we're moving to more consistently use an "implicit versioning" scheme, where last-version-of-each-observation-carried-forward is assumed everywhere in archives, this may make sense. However, we might then need to think about the naming or discussion of the $DT$version column.]

Some other existing mismatches between slide operations that we might want to think about:

  • mutate-like vs summarize-like
    • epi_slide is like mutate: it keeps existing columns and, given scalar output from f, broadcasts each slide result to the associated input locations to maintain size stability
    • epix_slide is like summarize: it only produces the grouping columns + f results, and doesn't broadcast broadcasts differently
  • Once n -> before, after / before hits:
    • before=k, after missing in epi_slide means a trailing/right-aligned window that will actually have data at that right side of the window (as we ensure ref_time_values %in% unique(x$time_value)), unless there are some variable-time-values-by-group things to think about
    • before=k, after missing/not-accepted-as-an-arg in epix_slide means a window extending infinitely far into the future, but in typical surveillance data cases, will only contain data up to some time before the associated ref_time_value; to call it trailing/right-aligned doesn't seem precise either way.

Advanced usage:

  • Making a forecast ahead of time; e.g., using the target set for forecast date d (target dates >= d) but using data as of some other version v < d (regular surveillance will only be available for time values < v).
  • datetime versions. epix_slide over datetime ref_time_values corresponding to a forecast pipeline schedule.

Compactify compatibility

Alternative to implicit versioning interface: explicit versioning interface

  • Implementation approaches:
    • something like implicit internals + an explicit observed/additionally-observed-but-no-updates version list.
    • inserting NA-value-filled versions like in epix_fill_through_version
  • Changes that would go along:
    • archive constructor might change to take in explicit version data by default & separate fn for the updated-based approach, rather than vice versa or constructing only using the update approach
    • as_of would raise error or give NAs in between observed versions
    • epix_slide would require ref_time_values as between observed versions
    • compactify would likely have less issues/details/additions needed to keep the same behavior between compactified and uncompactified data

@brookslogan
Copy link
Contributor Author

Another idea to consider here: guess what label to use for the ref_time_values based on the user output: if they provide a(n epi_)df with a time_value column, then use version; else use time_value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
op-semantics Operational semantics; many potentially breaking changes here P3 very low priority
Projects
None yet
Development

No branches or pull requests

1 participant