Skip to content

Commit 6e3f554

Browse files
committed
Merge branch 'main' into lcb/grouped_epi_archive
2 parents f6549a8 + 94d5ca5 commit 6e3f554

File tree

5 files changed

+374
-15
lines changed

5 files changed

+374
-15
lines changed

DESCRIPTION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Type: Package
22
Package: epiprocess
33
Title: Tools for basic signal processing in epidemiology
4-
Version: 1.0.0
4+
Version: 0.5.0.9999
55
Authors@R: c(
66
person("Jacob", "Bien", role = "ctb"),
77
person("Logan", "Brooks", role = "aut"),

NEWS.md

+249
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# epiprocess 0.5.0.9999 (development version)
2+
3+
Note that `epiprocess` uses the [Semantic Versioning
4+
("semver")](https://semver.org/) scheme for all release versions, but not for
5+
development versions. A ".9999" suffix indicates a development version.
6+
7+
## Cleanup:
8+
9+
* Added a `NEWS.md` file to track changes to the package.
10+
11+
# epiprocess 0.5.0:
12+
13+
## Potentially-breaking changes:
14+
15+
* `epix_slide`, `<epi_archive>$slide` now feed `f` an `epi_df` rather than
16+
converting to a tibble/`tbl_df` first, allowing use of `epi_df` methods and
17+
metadata, and often yielding `epi_df`s out of the slide as a result. To obtain
18+
the old behavior, convert to a tibble within `f`.
19+
20+
## Improvements:
21+
22+
* Fixed `epix_merge`, `<epi_archive>$merge` always raising error on `sync="truncate"`
23+
24+
## Cleanup:
25+
26+
* Added `Remotes:` entry for `genlasso`, which was removed from CRAN
27+
* Added `as_epi_archive` tests
28+
* Added missing `epix_merge` test for `sync="truncate"`
29+
30+
# epiprocess 0.4.0:
31+
32+
## Potentially-breaking changes:
33+
34+
* Fixed `[.epi_df` to not reorder columns, which was incompatible with
35+
downstream packages.
36+
* Changed `[.epi_df` decay-to-tibble logic to more coherent with `epi_df`s
37+
current tolerance of nonunique keys: stopped decaying to a tibble in some
38+
cases where a unique key wouldn't have been preserved, since we don't
39+
enforce a unique key elsewhere.
40+
* Fixed `[.epi_df` to adjust `"other_keys"` metadata when corresponding
41+
columns are selected out.
42+
* Fixed `[.epi_df` to raise an error if resulting column names would be
43+
nonunique.
44+
* Fixed `[.epi_df` to drop metadata if decaying to a tibble (due to removal
45+
of essential columns).
46+
47+
## Improvements:
48+
49+
* Added check that `epi_df` `additional_metadata` is list.
50+
* Fixed some incorrect `as_epi_df` examples.
51+
52+
## Cleanup:
53+
54+
* Applied rename of upstream package in examples: `delphi.epidata` ->
55+
`epidatr`.
56+
* Rounded out `[.epi_df` tests.
57+
58+
# epiprocess 0.3.0:
59+
60+
## Breaking changes:
61+
62+
* `as_epi_archive`, `epi_archive$new`:
63+
* Compactification (see below) by default may change results if working
64+
directly with the `epi_archive`'s `DT` field; to disable, pass in
65+
`compactify=FALSE`.
66+
* `epi_archive`'s wrappers and R6 methods have been updated to follow these
67+
rules regarding reference semantics:
68+
* `epix_<method>` will not mutate input `epi_archive`s, but may alias them
69+
or alias their fields (which should not be a worry if a user sticks to
70+
these `epix_*` functions and "regular" R functions with
71+
copy-on-write-like behavior, avoiding mutating functions `[.data.table`)
72+
* `x$<method>` may mutate `x`; if it mutates `x`, it will return `x`
73+
invisibly (where this makes sense), and, for each of its fields, may
74+
either mutate the object to which it refers or reseat the reference (but
75+
not both); if `x$<method>` does not mutate `x`, its result may contain
76+
aliases to `x` or its fields.
77+
* `epix_merge`, `<epi_archive>$merge`:
78+
* Removed `...`, `locf`, and `nan` parameters.
79+
* Changed the default behavior, which now corresponds to using
80+
`by=key(x$DT)` (but demanding that is the same set of column names as
81+
`key(y$DT)`), `all=TRUE`, `locf=TRUE`, `nan=NaN` (but with the
82+
post-filling step fixed to only apply to gaps, and no longer fill over
83+
`NA`s originating from `x$DT` and `y$DT`).
84+
* `x` and `y` are no longer allowed to share names of non-`by` columns.
85+
* `epix_merge` no longer mutates its `x` argument (but `$merge` continues
86+
to do so).
87+
* Removed (undocumented) capability of passing a `data.table` as `y`.
88+
* `epix_slide`:
89+
* Removed inappropriate/misleading `n=7` default argument (due to
90+
reporting latency, `n=7` will *not* yield 7 days of data in a typical
91+
daily-reporting surveillance data source, as one might have assumed).
92+
93+
## New features:
94+
95+
* `as_epi_archive`, `epi_archive$new`:
96+
* New `compactify` parameter allows removal of rows that are redundant for the
97+
purposes of `epi_archive`'s methods, which use the last version of each
98+
observation carried forward.
99+
* New `clobberable_versions_start` field allows marking a range of versions
100+
that could be "clobbered" (rewritten without assigning new version
101+
tags); previously, this was hard-coded as `max(<epi_archive>$DT$version)`.
102+
* New `versions_end` field allows marking a range of versions beyond
103+
`max(<epi_archive>$DT$version)` that were observed, but contained no
104+
changes.
105+
* `epix_merge`, `$merge`:
106+
* New `sync` parameter controls what to do if `x` and `y` aren't equally
107+
up to date (i.e., if `x$versions_end` and `y$versions_end` are
108+
different).
109+
* New function `epix_fill_through_version`, method
110+
`<epi_archive>$fill_through_version`: non-mutating & mutating way to
111+
ensure that an archive contains versions at least through some
112+
`fill_versions_end`, extrapolating according to `how` if necessary
113+
* Example archive data object is now constructed on demand from its
114+
underlying data, so it will be based on the user's version of
115+
`epi_archive` rather than an outdated R6 implementation from whenever the
116+
data object was generated.
117+
118+
# epiprocess 0.2.0:
119+
120+
## Breaking changes:
121+
122+
* Removed default `n=7` argument to `epix_slide`.
123+
124+
## Improvements:
125+
126+
* Ignore `NA`s when printing `time_value` range for an `epi_archive`.
127+
* Fixed misleading column naming in `epix_slide` example.
128+
* Trimmed down `epi_slide` examples.
129+
* Synced out-of-date docs.
130+
131+
## Cleanup:
132+
133+
* Removed dependency of some `epi_archive` tests on an example archive.
134+
object, and made them more understandable by reading without running.
135+
* Fixed `epi_df` tests relying on an S3 method for `epi_df` implemented
136+
externally to `epiprocess`.
137+
* Added tests for `epi_archive` methods and wrapper functions.
138+
* Removed some dead code.
139+
* Made `.{Rbuild,git}ignore` files more comprehensive.
140+
141+
# epiprocess 0.1.2:
142+
143+
## New features:
144+
145+
* New `new_epi_df` function is similar to `as_epi_df`, but (i) recalculates,
146+
overwrites, and/or drops most metadata of `x` if it has any, (ii) may
147+
still reorder the columns of `x` even if it's already an `epi_df`, and
148+
(iii) treats `x` as optional, constructing an empty `epi_df` by default.
149+
150+
## Improvements:
151+
152+
* Fixed `geo_type` guessing on alphabetical strings with more than 2
153+
characters to yield `"custom"`, not US `"nation"`.
154+
* Fixed `time_type` guessing to actually detect `Date`-class `time_value`s
155+
regularly spaced 7 days apart as `"week"`-type as intended.
156+
* Improved printing of `epi_df`s, `epi_archives`s.
157+
* Fixed `as_of` to not cut off any (forecast-like) data with `time_value >
158+
max_version`.
159+
* Expanded `epi_df` docs to include conversion from `tsibble`/`tbl_ts` objects,
160+
usage of `other_keys`, and pre-processing objects not following the
161+
`geo_value`, `time_value` naming scheme.
162+
* Expanded `epi_slide` examples to show how to use an `f` argument with
163+
named parameters.
164+
* Updated examples to print relevant columns given a common 80-column
165+
terminal width.
166+
* Added growth rate examples.
167+
* Improved `as_epi_archive` and `epi_archive$new`/`$initialize`
168+
documentation, including constructing a toy archive.
169+
170+
## Cleanup:
171+
172+
* Added tests for `epi_slide`, `epi_cor`, and internal utility functions.
173+
* Fixed currently-unused internal utility functions `MiddleL`, `MiddleR` to
174+
yield correct results on odd-length vectors.
175+
176+
# epiprocess 0.1.1:
177+
178+
## New features:
179+
180+
* New example data objects allow one to quickly experiment with `epi_df`s
181+
and `epi_archives` without relying/waiting on an API to fetch data.
182+
183+
## Improvements:
184+
185+
* Improved `epi_slide` error messaging.
186+
* Fixed description of the appropriate parameters for an `f` argument to
187+
`epi_slide`; previous description would give incorrect behavior if `f` had
188+
named parameters that did not receive values from `epi_slide`'s `...`.
189+
* Added some examples throughout the package.
190+
* Using example data objects in vignettes also speeds up vignette compilation.
191+
192+
## Cleanup:
193+
194+
* Set up gh-actions CI.
195+
* Added tests for `epi_df`s.
196+
197+
# epiprocess 0.1.0
198+
199+
## Implemented core functionality, vignettes:
200+
201+
Classes:
202+
* `epi_df`: specialized `tbl_df` for geotemporal epidemiological time
203+
series data, with optional metadata recording other key columns (e.g.,
204+
demographic breakdowns) and `as_of` what time/version this data was
205+
current/published. Associated functions:
206+
* `as_epi_df` converts to an `epi_df`, guessing the `geo_type`,
207+
`time_type`, `other_keys`, and `as_of` if not specified.
208+
* `as_epi_df.tbl_ts` and `as_tsibble.epi_df` automatically set
209+
`other_keys` and `key`&`index`, respectively.
210+
* `epi_slide` applies a user-supplied computation to a sliding/rolling
211+
time window and user-specified groups, adding the results as new
212+
columns, and recycling/broadcasting results to keep the result size
213+
stable. Allows computation to be provided as a function, `purrr`-style
214+
formula, or tidyeval dots. Uses `slider` underneath for efficiency.
215+
* `epi_cor` calculates Pearson, Kendall, or Spearman correlations
216+
between two (optionally time-shifted) variables in an `epi_df` within
217+
user-specified groups.
218+
* Convenience function: `is_epi_df`
219+
* `epi_archive`: R6 class for version (patch) data for geotemporal
220+
epidemiological time series data sets. Comes with S3 methods and regular
221+
functions that wrap around this functionality for those unfamiliar with R6
222+
methods. Associated functions:
223+
* `as_epi_archive`: prepares an `epi_archive` object from a data frame
224+
containing snapshots and/or patch data for every available version of
225+
the data set.
226+
* `as_of`: extracts a snapshot of the data set as of some requested
227+
version, in `epi_df` format
228+
* `epix_slide`, `<epi_archive>$slide`: similar to `epi_slide`, but for
229+
`epi_archive`s; for each requested `ref_time_value` and group, applies
230+
a time window and user-specified computation to a snapshot of the data
231+
as of `ref_time_value`.
232+
* `epix_merge`, `<epi_archive>$merge`: like `merge` for `epi_archive`s,
233+
but allowing for the last version of each observation to be carried
234+
forward to fill in gaps in `x` or `y`.
235+
* Convenience function: `is_epi_archive`
236+
237+
Additional functions:
238+
* `growth_rate`: estimates growth rate of a time series using one of a few
239+
built-in `method`s based on relative change, linear regression,
240+
smoothing splines, or trend filtering.
241+
* `detect_outlr`: applies one or more outlier detection methods to a given
242+
signal variable, and optionally aggregates the outputs to create a
243+
consensus result
244+
* `detect_outlr_rm`: outlier detection function based on a
245+
rolling-median-based outlier detection function; one of the methods
246+
included in `detect_outlr`.
247+
* `detect_outlr_stl`: outlier detection function based on a seasonal-trend
248+
decomposition using LOESS (STL); one of the methods included in
249+
`detect_outlr`.

R/methods-epi_archive.R

+28-10
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ epix_fill_through_version = function(x, fill_versions_end,
152152
#' # vs. mutating x to hold the merge result:
153153
#' x$merge(y)
154154
#'
155-
#' @importFrom data.table key set
155+
#' @importFrom data.table key set setkeyv
156156
#' @export
157157
epix_merge = function(x, y,
158158
sync = c("forbid","na","locf","truncate"),
@@ -215,18 +215,36 @@ epix_merge = function(x, y,
215215
y_DT = epix_fill_through_version(y, new_versions_end, sync)$DT
216216
} else if (sync == "truncate") {
217217
new_versions_end = min(x$versions_end, y$versions_end)
218-
x_DT = x$DT[x[["DT"]][["version"]] <= new_versions_end, with=FALSE]
219-
y_DT = y$DT[y[["DT"]][["version"]] <= new_versions_end, with=FALSE]
218+
x_DT = x$DT[x[["DT"]][["version"]] <= new_versions_end, names(x$DT), with=FALSE]
219+
y_DT = y$DT[y[["DT"]][["version"]] <= new_versions_end, names(y$DT), with=FALSE]
220220
} else Abort("unimplemented")
221221

222-
if (!identical(key(x$DT), key(x_DT)) || !identical(key(y$DT), key(y_DT))) {
223-
Abort("preprocessing of data tables in merge changed the key unexpectedly",
224-
internal=TRUE)
222+
# key(x_DT) should be the same as key(x$DT) and key(y_DT) should be the same
223+
# as key(y$DT). Below, we only use {x,y}_DT in the code (making it easier to
224+
# split the code into separate functions if we wish), but still refer to
225+
# {x,y}$DT in the error messages (further relying on this assumption).
226+
#
227+
# Check&ensure that the above assumption; if it didn't already hold, we likely
228+
# have a bug in the preprocessing, a weird/invalid archive as input, and/or a
229+
# data.table version with different semantics (which may break other parts of
230+
# our code).
231+
x_DT_key_as_expected = identical(key(x$DT), key(x_DT))
232+
y_DT_key_as_expected = identical(key(y$DT), key(y_DT))
233+
if (!x_DT_key_as_expected || !y_DT_key_as_expected) {
234+
Warn("
235+
`epiprocess` internal warning (please report): pre-processing for
236+
epix_merge unexpectedly resulted in an intermediate data table (or
237+
tables) with a different key than the corresponding input archive.
238+
Manually setting intermediate data table keys to the expected values.
239+
", internal=TRUE)
240+
setkeyv(x_DT, key(x$DT))
241+
setkeyv(y_DT, key(y$DT))
225242
}
226-
## key(x_DT) should be the same as key(x$DT) and key(y_DT) should be the same
227-
## as key(y$DT). If we want to break this function into parts it makes sense
228-
## to use {x,y}_DT below, but this makes the error checks and messages look a
229-
## little weird and rely on the key-matching assumption above.
243+
# Without some sort of annotations of what various columns represent, we can't
244+
# do something that makes sense when merging archives with mismatched keys.
245+
# E.g., even if we assume extra keys represent demographic breakdowns, a
246+
# sensible default treatment of count-type and rate-type value columns would
247+
# differ.
230248
if (!identical(sort(key(x_DT)), sort(key(y_DT)))) {
231249
Abort("
232250
The archives must have the same set of key column names; if the

pkgdown/extra.scss

+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
/* The news/changelog in pkgdown 2.0.6 is squashed relative to 1.6.1, and
2+
secondary headings are too prominent when using ## (but we can't change to
3+
### without impacting side navbar). Just trying a couple of bootswatches
4+
didn't seem to help, and nice template packages might be restricted for use
5+
by particular groups (e.g., tidytemplate has such a restriction).
6+
*/
7+
8+
/* Current approach: add some spacing with CSS, and have h3 extend h4 so that
9+
##'s (which use h3) will render with a bit smaller fonts, while still being
10+
recognized/included by the page navigation / TOC feature.
11+
*/
12+
13+
/* General structure: div.template-news wraps everything of interest regarding
14+
the rendered NEWS.md. Within that, div.level2's wrap the each package version
15+
+ the changes for that version. (Within those,) h2.pkg-version's label the
16+
versions.
17+
*/
18+
19+
20+
21+
/* Matches the first-listed version's section. (This is written as a general
22+
rule, but the adjacent sibling rule with override it for non-first versions'
23+
sections. Using :first-child probably wouldn't work as a sibling
24+
div.page-header precedes the first div.level2.) */
25+
div.template-news div.level2 {
26+
margin-top: 1.5em;
27+
}
28+
29+
/* Matches subsequent versions' sections. Places more vspace between these
30+
sections than before the first section.
31+
*/
32+
div.template-news div.level2 + div.level2 {
33+
margin-top: 2.5em;
34+
}
35+
36+
/* Place some additional vspace after each version number heading; currently,
37+
the immediately following content is always a secondary heading, which looks
38+
weird with the default spacing.
39+
*/
40+
div.template-news h2.pkg-version {
41+
margin-bottom: 0.5em;
42+
}
43+
44+
/* Use `h4` styling for `h3`s (the ## headings); this is the only thing we need
45+
.scss for, and we could really just copy-paste in the appropriate value if
46+
needed: */
47+
div.template-news h3 {
48+
@extend h4;
49+
}
50+
51+
52+
/* Original approach, to be removed at some later time: try adding hrules before
53+
and after primary headings (version numbers). The initial "hrule" (actually a
54+
border) after the "Source:" pointer has a different color from natural
55+
hrules, so we need some custom CSS styling to get these colors to match and
56+
look okay:
57+
*/
58+
59+
/* .template-news .page-header { */
60+
/* /\* 1px solid to match original *\/ */
61+
/* /\* (original color was something like --bs-default which seemed to be set to */
62+
/* --bs-gray-300) *\/ */
63+
/* border-bottom: 1px solid var(--bs-secondary); */
64+
/* } */
65+
66+
/* .template-news hr { */
67+
/* height: 1px; /\* defensive *\/ */
68+
/* background-color: var(--bs-secondary); */
69+
/* opacity: 1; /\* counteracts a 0.25 setting somewhere *\/ */
70+
/* } */

0 commit comments

Comments
 (0)