Skip to content

Commit

Permalink
Merge pull request #9 from r-causal/ess
Browse files Browse the repository at this point in the history
Add `ess()`
  • Loading branch information
malcolmbarrett authored Jan 9, 2025
2 parents c093f22 + 64f7722 commit 3d0c933
Show file tree
Hide file tree
Showing 9 changed files with 225 additions and 22 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
export(bind_matches)
export(contains)
export(ends_with)
export(ess)
export(everything)
export(geom_ecdf)
export(geom_love)
Expand Down
48 changes: 48 additions & 0 deletions R/ess.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#' Calculate the Effective Sample Size (ESS)
#'
#' This function computes the effective sample size (ESS) given a vector of
#' weights, using the classical \eqn{(\sum w)^2 / \sum(w^2)} formula (sometimes
#' referred to as "Kish's effective sample size").
#'
#' @param wts A numeric vector of weights (e.g., from survey or
#' inverse-probability weighting).
#'
#' @return A single numeric value representing the effective sample size.
#'
#' @details The effective sample size (ESS) reflects how many observations you
#' would have if all were equally weighted. If the weights vary substantially,
#' the ESS can be much smaller than the actual number of observations.
#' Formally:
#'
#' \deqn{
#' \mathrm{ESS} = \frac{\left(\sum_i w_i\right)^2}{\sum_i w_i^2}.
#' }
#'
#' **Diagnostic Value**:
#' * **Indicator of Weight Concentration**: A large discrepancy between ESS
#' and the actual sample size indicates that a few observations carry
#' disproportionately large weights, effectively reducing the usable
#' information in the dataset.
#' * **Variance Inflation**: A small ESS signals that weighted estimates are
#' more sensitive to a handful of observations, inflating the variance and
#' standard errors.
#' * **Practical Guidance**: If ESS is much lower than the total sample
#' size, it is advisable to investigate why some weights are extremely large
#' or small. Techniques like weight trimming or stabilized weights might be
#' employed to mitigate the issue
#'
#' @examples
#' # Suppose we have five observations with equal weights
#' wts1 <- rep(1.2, 5)
#' # returns 5, because all weights are equal
#' ess(wts1)
#'
#' # If weights vary more, smaller than 5
#' wts2 <- c(0.5, 2, 2, 0.1, 0.8)
#' ess(wts2)
#'
#' @export
ess <- function(wts) {
sum(wts)^2 / sum(wts^2)
}

56 changes: 56 additions & 0 deletions man/ess.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

62 changes: 51 additions & 11 deletions man/geom_ecdf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 39 additions & 8 deletions man/geom_mirror_histogram.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file modified tests/testthat/Rplots.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions tests/testthat/_snaps/geom_mirrored_histogram.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# geom_mirrored_histogram errors/warns correctly

Computation failed in `stat_mirror_count()`
Computation failed in `stat_mirror_count()`.
Caused by error in `abort()`:
! Groups of three or greater not supported in `geom_mirror_histogram()`

---

Computation failed in `stat_mirror_count()`
Computation failed in `stat_mirror_count()`.
Caused by error in `abort()`:
! No group detected.
* Do you need to use `aes(group = ...)` with your grouping variable?
Expand Down
27 changes: 27 additions & 0 deletions tests/testthat/test-ess.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
test_that("ess returns correct result for equal weights", {
# 5 observations, each weight = 2
wts_equal <- rep(2, 5)
# ESS should be 5
expect_equal(ess(wts_equal), 5)
})

test_that("ess returns correct result for varied weights", {
# 5 observations, each weight varies
wts_equal <- runif(5, max = 5)
# ESS should always be less than 5
expect_lt(ess(wts_equal), 5)
})

test_that("ess handles one large weight", {
# 5 observations, 1 large weight
wts_big <- c(1000, rep(0, 4))
# The sum is 1000, sum of squares is 1,000^2 = 1e6
# ESS = (1000^2) / 1,000^2 = 1
expect_equal(ess(wts_big), 1)
})

test_that("ess gives `NaN` if all weights are 0", {
wts_zero <- rep(0, 5)
# sum(wts) = 0, sum(wts^2) = 0 -> 0/0 is NaN
expect_true(is.nan(ess(wts_zero)))
})

0 comments on commit 3d0c933

Please sign in to comment.