-
Notifications
You must be signed in to change notification settings - Fork 10
extrapolate_quantiles
and thresholding
#434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The bug is still live, it's just not on the default path. #435 just decreased the urgency |
Can you give an example of where it happens? I'm OK with option 2. |
[It's pretty undesirable to threshold and then extrapolate. So, we shouldn't be doing this anywhere. And ideally, the user wouldn't either. No matter what the "fix" it will be demonstrably worse than doing it in the other order.] |
If they change the default forecast_date <- as.Date("2021-08-01")
two_week_ahead <- arx_forecaster(
covid_case_death_rates |>
filter(time_value <= forecast_date),
outcome = "death_rate",
trainer = quantile_reg(),
predictors = c("death_rate"),
args_list = arx_args_list(
lags = list(c(0, 7, 14)),
ahead = 32,
quantile_levels = c(0.2, 0.3, 0.5, 0.7, 0.8)
)
)
two_week_ahead$predictions$.pred_distn[[1]] %>%
quantile(0.05)
[1] -0.01970189 |
Now I see. In that case, I'm leaning toward either:
|
how does this interact with #442 by the way? |
it doesn't. Behaviour will be the same. |
To be clear, I wouldn't call this issue a bug. Consider the following:
Suppose I summarize both vectors X and Y with the set of quantiles (.25, .5, .75). I give you this summary and ask you to guess the 0.05 quantile. How would you do it? I would at least see that the .25 quantile is 0 in both cases but that gives me no information about the tails. I could make these examples more complicated to create all sorts of corner cases. If you want to change the assumptions, that's fine. Then we should decide what assumptions reasonably apply to a wide variety of cases. Or we can expect the user to interpret their results correctly. The current implementation assumes that the underlying distribution is continuous and that the tail has certain properties. These assumptions could be documented (they're not currently). Or they could be changed. The only reasonable change (in my mind) is to just refuse to extrapolate. Though I can create examples where extrapolation is desirable, functional, and reasonable (and I suspect that's why the functionality exists in the first place). My suggestion is:
Your example above illustrates undesirable user behaviour, but it's not clear that we can or should guard against users doing things they sometimes shouldn't do. It's not good to compute quantiles, threshold them, then try to extrapolate beyond their range in this example. If you want them |
Y is actually a distribution whose support is restricted, so assuming we actually know that about Y, I would extrapolate as with the case for X, and then truncate to the known interval [0,\inf). We/our users know in context whether or not our distribution is truncated. Adding truncation as a feature of extrapolation, and as a thing that auto-plot picks up on seem not that terribly complicated. |
Discovered in #431, if you try to extrapolate quantiles for predictions which have been thresholded, you can get quantiles outside of those thresholds. For example, if you have a quantile level of
.1
which has value0
, then.05
will be negative, even if the quantile is negative.Ways I can think to deal with this:
c(0,Inf)
, and use that inquantile_extrapolate
. Unsure if the types will play nicely with this one.quantile_extrapolate
. Seems like it will require the user to know that they have thresholds, whereas the first option will build it into the result as part oflayer_threshold
.The text was updated successfully, but these errors were encountered: