Skip to content

extrapolate_quantiles and thresholding #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsweber2 opened this issue Feb 7, 2025 · 11 comments · Fixed by #443
Closed

extrapolate_quantiles and thresholding #434

dsweber2 opened this issue Feb 7, 2025 · 11 comments · Fixed by #443
Assignees
Labels
enhancement New feature or request

Comments

@dsweber2
Copy link
Contributor

dsweber2 commented Feb 7, 2025

Discovered in #431, if you try to extrapolate quantiles for predictions which have been thresholded, you can get quantiles outside of those thresholds. For example, if you have a quantile level of .1 which has value 0, then .05 will be negative, even if the quantile is negative.

Ways I can think to deal with this:

  1. include the support in the distribution, probably as an interval, e.g. c(0,Inf), and use that in quantile_extrapolate. Unsure if the types will play nicely with this one.
  2. Add limit options to get passed to quantile_extrapolate. Seems like it will require the user to know that they have thresholds, whereas the first option will build it into the result as part of layer_threshold.
  3. Just let it be and let the user figure it out by mentally thresholding. Seems not ideal.
@dsweber2 dsweber2 added the bug Something isn't working label Feb 7, 2025
@dsweber2 dsweber2 mentioned this issue Feb 7, 2025
5 tasks
@dajmcdon
Copy link
Contributor

@dsweber2 Is this still relevant or effectively closed by #435?

@dsweber2
Copy link
Contributor Author

The bug is still live, it's just not on the default path. #435 just decreased the urgency

@dajmcdon
Copy link
Contributor

Can you give an example of where it happens? I'm OK with option 2.

@dajmcdon
Copy link
Contributor

[It's pretty undesirable to threshold and then extrapolate. So, we shouldn't be doing this anywhere. And ideally, the user wouldn't either. No matter what the "fix" it will be demonstrably worse than doing it in the other order.]

@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 25, 2025

If they change the default quantile_levels in the original example, they can reproduce the original behavior

forecast_date <- as.Date("2021-08-01")
two_week_ahead <- arx_forecaster(
  covid_case_death_rates |>
    filter(time_value <= forecast_date),
  outcome = "death_rate",
  trainer = quantile_reg(),
  predictors = c("death_rate"),
  args_list = arx_args_list(
    lags = list(c(0, 7, 14)),
    ahead = 32,
    quantile_levels = c(0.2, 0.3, 0.5, 0.7, 0.8)
  )
)
two_week_ahead$predictions$.pred_distn[[1]] %>%
  quantile(0.05)
[1] -0.01970189

@dajmcdon
Copy link
Contributor

Now I see. In that case, I'm leaning toward either:

  1. Document the extrapolation behaviour and note that user deciding to do something that they shouldn't may lead to results they don't want.
  2. Remove the ability to extrapolate completely. Allow it to do imputation in the middle, but refuse to handle the tails.

@dsweber2
Copy link
Contributor Author

how does this interact with #442 by the way?

@dajmcdon
Copy link
Contributor

it doesn't. Behaviour will be the same.

@dajmcdon
Copy link
Contributor

To be clear, I wouldn't call this issue a bug. Consider the following:

  1. Let X be generated rnorm(n, qnorm(.25)).
  2. Let Y be generated pmax(rnorm(n, qnorm(.25)), 0).

Suppose I summarize both vectors X and Y with the set of quantiles (.25, .5, .75). I give you this summary and ask you to guess the 0.05 quantile. How would you do it? I would at least see that the .25 quantile is 0 in both cases but that gives me no information about the tails. I could make these examples more complicated to create all sorts of corner cases. If you want to change the assumptions, that's fine. Then we should decide what assumptions reasonably apply to a wide variety of cases. Or we can expect the user to interpret their results correctly.

The current implementation assumes that the underlying distribution is continuous and that the tail has certain properties. These assumptions could be documented (they're not currently). Or they could be changed. The only reasonable change (in my mind) is to just refuse to extrapolate. Though I can create examples where extrapolation is desirable, functional, and reasonable (and I suspect that's why the functionality exists in the first place).

My suggestion is:

  1. Add documentation for the assumptions.
  2. Possibly export the snap() function (and reference it and document it) to allow the user to adjust as they see fit.

Your example above illustrates undesirable user behaviour, but it's not clear that we can or should guard against users doing things they sometimes shouldn't do. It's not good to compute quantiles, threshold them, then try to extrapolate beyond their range in this example. If you want them

@dsweber2
Copy link
Contributor Author

Y is actually a distribution whose support is restricted, so assuming we actually know that about Y, I would extrapolate as with the case for X, and then truncate to the known interval [0,\inf).

We/our users know in context whether or not our distribution is truncated. Adding truncation as a feature of extrapolation, and as a thing that auto-plot picks up on seem not that terribly complicated.

@dajmcdon dajmcdon added enhancement New feature or request and removed bug Something isn't working labels Feb 26, 2025
@dajmcdon dajmcdon self-assigned this Feb 26, 2025
dajmcdon added a commit that referenced this issue Feb 26, 2025
@dajmcdon dajmcdon linked a pull request Feb 26, 2025 that will close this issue
4 tasks
@dajmcdon
Copy link
Contributor

Done.

See 3e3fa31 and 17614ab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants