Skip to content

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Jun 21, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

  • guess time_type of "day", "week", "yearmonth", "integer" from time column and warn if not compatible at epi_df and epi_archive construction time
  • restrict before and after to types compatible with time column
  • deprecate geo_type and time_type in epi_df and epi_archive constructor arguments, infer only
  • improve documentation on geo and time types
  • enforce time_value and version being same type in epi_archive
  • update vignettes
  • move arg validation from new_epi_df to as_epi_df.tbl_df to match epi_archive
  • allow before = Inf in epi_slide and epix_slide and make it default in epix_slide
  • remove time_step argument in slides

New time type behaviors

  • we discover "day" and "weekly" types if the time value column is Date and based on the cadence; if the cadence is slower than a month, then we give an error and suggest converting to yearmonth or year
  • we discover "yearmonth" and "integer" based on type
  • otherwise time_type is "custom" and it gives a warning that unexpected behavior may result
  • time value and version columns are expected to have the same type in epi_archives
  • time_type and geo_type arguments to as_epi_df and as_epi_archive are now deprecated and they say so in documentation and on use
  • before/after arguments allow integers, day-difftimes for "day", allow week-difftimes for "week", allow integer for "yearmonth" and "integer"
  • slide functions (including opt functions) error with a clear error when they're given an epi_df with a "custom" time type
    previously allowed types that are now labeled custom: POSIXct, yearquarter, yearweek

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@dshemetov dshemetov requested a review from brookslogan June 21, 2024 01:48
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments; very minor stuff + a couple potential tasks that could be spun off into separate issues. If you tackle the latter in this PR, might be worth a re-review from me or Nat.

@dshemetov dshemetov force-pushed the ds/before-default branch from 2d6c937 to 272b320 Compare July 2, 2024 00:22
@dshemetov dshemetov changed the title refactor(epix_slide): before defaults to Inf, remove time_step refactor(epi_slide, epix_slide): remove time_step, remove time_type, unify time handling, require difftimes in before and after Jul 2, 2024
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach seems good. Design-wise I had some thoughts on what column we base before validation on in epix_slide(). Plus maybe found a bug.

@brookslogan
Copy link
Contributor

issue: local tests fail, with various complaints like

Error in `validate_version_bound(fill_versions_end, x$DT, na_ok = FALSE)`: fill_versions_end must have the same `class` vector as x$version, which has a `class` of "Date"

as well as

Error in `epix_merge(as_epi_archive(tibble::tibble(geo_value = "pa", time_value = d, 
    version = d + 1L, x_value = 1L)), as_epi_archive(tibble::tibble(geo_value = "pa", 
    time_value = tsibble::make_yearweek(2020, 1), version = d + 
        5L, y_value = 2L)))`: `x` and `y` were not equally up to date version-wise: `x$versions_end` was not identical to `y$versions_end`; either ensure that `x` and `y` are equally up to date before merging, or specify how to deal with this using `sync`

... and it looks like we're not running checks in CI somehow??

@brookslogan
Copy link
Contributor

@nmdefries I would also appreciate if you could take a pass. I feel like I must have glossed over something given the check results, and was pausing until tomorrow but accidentally hit approve instead of comment.

@brookslogan
Copy link
Contributor

brookslogan commented Jul 17, 2024

Sorry, I believe I said I'd take a look at the failing checks while @nmdefries finishes looking over the rest.

  • There's a number of failing tests regarding epi_df grouping; this is a breaking change due to an upstream package update; rebase onto dev and you will get test grouping only before tsibble 1.1.5 #480 to skip these tests for now.
  • Some vignettes are failing to build with errors like
   Both `before` and `after` are required scalar values.

   Expected before to be a difftime with units in days.

   `...` must be empty.
   ✖ Problematic argument:
   • time_type = "day"
  • Some of these might be addressed by loosening the restrictions as noted in some of the discussion above; others might require vignette updates to conform with the new interface.

In particular, along the loosening-restrictions route, I think we discussed the following but maybe only verbally; sorry if these are repeats:

  • suggest: allow integerish time windows when time_type is "day".
  • suggest: in guess_time_type, don't infer "day" if length(unique_time_gaps) != 0L and it doesn't contain any values that are equal to 1. The intent is to generate errors on monthly and yearly data. You could also maybe require something like 10% rounded up of unique_time_gaps to be equal to 1 or something along these lines.

@dshemetov dshemetov force-pushed the ds/before-default branch 2 times, most recently from 20fe4ff to 0d61a98 Compare July 17, 2024 20:46
Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epi_slide_opt and related look good aside from the factor issue Logan brought up. I did not look at other changes.

praise: This is quite the complex PR! Thanks for robustifying this, and all the associated research into time-handling.

@dshemetov
Copy link
Contributor Author

suggest: in guess_time_type, don't infer "day" if length(unique_time_gaps) != 0L and it doesn't contain any values that are equal to 1. The intent is to generate errors on monthly and yearly data. You could also maybe require something like 10% rounded up of unique_time_gaps to be equal to 1 or something along these lines.

@lcbrooks the >10% equal to 1 thing seems too fuzzy to turn into an error. I think checking if all(unique_time_gaps) >= 28 and then giving an error is better.

@brookslogan
Copy link
Contributor

I think checking if all(unique_time_gaps) >= 28 and then giving an error is better.

Sounds good, but check parens here (all(unique_time_gaps >= 28)).

(In "normal" data I'd expect this to be similar to a length(unique_time_gaps) == 0L || any(unique_time_gaps == 1) approach. For non-"normal" data.... not sure, not a priority.)

@dshemetov dshemetov force-pushed the ds/before-default branch 9 times, most recently from df4fd05 to c095c1f Compare July 19, 2024 22:21
* guess time_type of "day", "week", "yearmonth", "integer" from time column and warn if not compatible
* restrict `before` and `after` to types compatible with time column
* deprecate geo_type and time_type constructor arguments, infer only
* improve documentation on geo and time types
* enforce time_value and version being same type in epi_archive
* update vignettes
* move arg validation from new_epi_df to as_epi_df.tbl_df to match epi_archive

a
@dshemetov dshemetov force-pushed the ds/before-default branch from c095c1f to 7a50d9d Compare July 20, 2024 00:25
@dshemetov dshemetov merged commit 4f44223 into dev Jul 20, 2024
@dshemetov dshemetov deleted the ds/before-default branch July 20, 2024 00:43
@dshemetov dshemetov changed the title refactor(epi_slide, epix_slide): remove time_step, remove time_type, unify time handling, require difftimes in before and after refactor(epi_slide, epix_slide): time types refactor Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants