Skip to content

revision analysis first draft #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 13, 2024
Merged

revision analysis first draft #492

merged 21 commits into from
Aug 13, 2024

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented Jul 24, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

This adds a function that computes some per-epi-key (so normally geo_value-time_value pairs) stats about revisions. Under normal operation it drops all NA values so they are treated as missing. It returns a tibble that has 1 row for each epi-key, with columns:

  1. min_lag: the minimum time to any value (if drop_nas=FALSE, this includes NA's)
  2. max_lag: the amount of time until the final (new) version (same caveat for drop_nas=FALSE, though it is far less likely to matter)
  3. max_change: the difference between the smallest and largest values (this always excludes NA values)
  4. max_rel_change: max_change divided by the largest value (so it will always be less than 1). Note that this need not be the final value. It will be NA whenever max_change is 0.
  5. time_to_x_final: where x is the percent_final_value (default). This gives the lag when the value is within 1-x of the value at the final time. For example, consider the series (0,20, 99, 150, 102, 100); then time_to_x_final is the 5th index, since even though 99 is within 20%, it is outside the window afterwards at 150.

The function also (optionally) prints out an overall summary of this summary.

This PR factors out compactification into a standalone function (since I needed it to drop NAs effectively.

Still to do:

  • basic tests on a simple example
  • make sure the docs make sense and include reasonable examples

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@dsweber2 dsweber2 requested a review from brookslogan July 24, 2024 17:03
@dsweber2 dsweber2 self-assigned this Jul 24, 2024
@dsweber2
Copy link
Contributor Author

dsweber2 commented Jul 24, 2024

❯ checking dependencies in R code ... WARNING
  '::' or ':::' import not declared from: ‘glue’

makes no sense, the only usage of glue is via glue::glue. thanks Logan, I was looking at the wrong description

@dsweber2 dsweber2 marked this pull request as ready for review July 25, 2024 21:39
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I got caught up on a few style/convention/technicalities points. Main "substantive" things are that I think the names of some quantities need some brainstorming.

Need to do another pass about correctness (think we flagged some things on Slack).

@brookslogan brookslogan self-requested a review July 26, 2024 19:45
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Complaining/brainstorming more about naming.
  • I'm getting some surprising results regarding min and max lags. Think there might be a bug here and maybe some pre-existing issues. Hoping you can figure out what's up before we push the button on this. user error

cli_inform("Change by more than {abs_change_threshold} in actual value (when revised):")
cli_li(num_percent(abs_change, n_real_revised))
}
return(revision_behavior)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return(revision_behavior)
return(invisible(revision_behavior))

idea: we might also want to make this an S3-classed thing with a print that will print the summary + a pointer to use as_tibble() to get the underlying data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno, I personally find printing the full thing under if you haven't assigned it to a variable to be somewhat useful, if maybe a bit cluttered.

Copy link
Contributor

@brookslogan brookslogan Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Might consider then:

  • [Adding some sort of explanatory blurb to explain what this tibble is... it'd be convenient to just tack this on to the end of the summary you print, but might be a little hard to word]
  • Alternatively, if printing the summary, also explicitly print revision_behavior, but with a little explanatory blurb beforehand, and returning invisibly if the summary printing was requested (to not double-print it) [--- maybe this is too much clutter..]

(Any way, always returning invisibly isn't the right thing. Should definitely return visibly if the summary wasn't printed.)

@brookslogan brookslogan self-requested a review July 29, 2024 19:33
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized my previous main concern wasn't actually an issue.

@dsweber2 dsweber2 force-pushed the revisionSummary branch 2 times, most recently from 62e496d to 8617ea7 Compare July 30, 2024 15:33
@dsweber2 dsweber2 merged commit 1f44295 into dev Aug 13, 2024
5 checks passed
@dshemetov dshemetov deleted the revisionSummary branch August 14, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

basic revision analysis stats
2 participants