-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 118: First pass at missing data support #107
Conversation
@SamuelBrand1 any thoughts here? The issue I had here was in trying to give both a vectorised and non-vectorised likelihood option but perhaps just accepting it can't be vectorised is the way forward for now? It all seems surprisingly clunky to me as well so perhaps I have just got the wrong end of the stick/. |
Yeah, the underlying maths here is that vectorised == multivariate distribution and non-vectorised == conditionally independent. Having a vector with element type So I'd suggest just trying out a non-vectorised approach and checking if it really does have a big performance hit. The AD systems evolve independent of the PPL and so what might have been true when they wrote their performance tips might not be true now. If it doesn't then a loop is more flexible, missing elements will just work as expected. |
Another option is to make the concept of Then the usage would be |
I really don't love this and it feels very clunky but I see it would work and enable vectorisation. |
Note I have had to update the getting started examples to use |
Do you have any sense that this is noticeably bad for inference time? |
the only real evidence I have for this is running the getting started example. Comparing CI runs there is perhaps a small slowdown (https://github.com/CDCgov/Rt-without-renewal/actions) but it doesn't seem massive (though hard to know how that would scale given so little of that CI check is from inference). For interest in |
Very nice. We can do that with BenchmarkTools I think |
We could have the not missing (or missing indices) as input-able data? |
That is why I was using it for benchmarking as it should show the difference in speed for complete data (which is what we care about for the benchmark). I tested the generated quantities in tests but yes ideally we would have a small example to had partially complete data at some point (an issue and not a priority for now I think). |
Where did you try this? Shouldn't it be here:
|
Ooops I hadn't committed my change |
I added a small example here as part of the getting started example: e28cc8c |
Nice one. LGTM now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
This PR adds support for missing data in the observation model by moving from a vectorised to loop approach. It also renames the negative binomial observation model to match the distribution naming scheme.