From 68b6fcf38bb727f43aedec3ad591b29eea91ae36 Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Wed, 14 Feb 2024 15:29:22 -0500 Subject: [PATCH 1/4] mixtures proposal --- designs/0034-mixtures.md | 209 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 designs/0034-mixtures.md diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md new file mode 100644 index 0000000..9a3db27 --- /dev/null +++ b/designs/0034-mixtures.md @@ -0,0 +1,209 @@ +- Feature Name: 0034-mixtures +- Start Date: 2024-02-14 +- RFC PR: +- Stan Issue: + +# Summary +[summary]: #summary + +The mixtures feature is designed to allow users to specify mixture +models using a consistent density-based notation without having to +worry about marginalization and working on the log scale. + + + +# Motivation +[motivation]: #motivation + +Users writing their own mixture models has been error prone. A +specific failure mode is vectorization and mixing over the whole +population versus mixing over individuals. The latter is almost +always intended, but the former is often coded. + + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +The syntax follows that of a distribution. For example, a +two-component normal mixture model could be written in canonical form +as follows. + +```stan +y ~ mixture(p, + normal(m[1], s[1]), + normal(m[2], s[2])); +``` + +Here, `p` is a probability in (0, 1) and `normal(m[1], s[1])` acts +like a lambda binding of the first argument (more on that below). +This notation would be syntactic sugar for the following expression + +```stan +target += log_sum_exp(log(p) + normal_lpdf(y | m[1], s[1]), + log1m(p) + normal_lpdf(y | m[2], s[2])); +``` + +Because this follows distribution syntax, for consistency we would +define +```stan +mixture_lpdf(y | p, lp1, lp2) +=def= +log_sum_exp(log(p) + lp1(y), log1m(p) + lp2(y)); +``` + +where we have explicitly used application syntax for `lp1` and `lp2`. + +If Stan had C++ style lambdas, a distribution without an outcome +variable would be defined as follows. + +```stan +normal_lpdf(m[1], s[1]) +=def= +[y].normal_lpdf(y | m[1], s[1]) +``` + +Even though a distribution without an outcome behaves like a lambda in +the context of mixtures, this will *not* be generally available in +Stan, so that it would not be legal to write the following. + +```stan +target += normal(m[1], s[1])(y); // ILLEGAL +``` + +Unary functions will be allowed as function arguments to `mixture`, so the +following would be legal. + +```stan +functions { + real normal1(real y) { + real m1 = ...; + real s1 = ...; + return normal_lpdf(y | m1, s1); + } + real normal2(real y) { + real m2 = ...; + real s2 = ...; + return normal_lpdf(y | m2, s2); + } +} +... +real p; +... +model { + y ~ mixture(p, normal1, normal2); +} +``` + +The alternative to allowing function arguments would be to force the +user introduce new distribution functions, such as the following, +which defines a zero-inflated Poisson as a mixture model. + +```stan +functions { + real zero_delta_lpmf(int y) { + return y == 0 ? 0 : -infinity(); + } +} +... +real p; +... +model { + y ~ mixture(p, zero_delta, poisson(lam)); +``` + +The idea is to allow arbitrary numbers of components in the mixture, +with the first argument being a simplex matching in size. For +example, the following would be a legal way to define a 3-component mixture. + +```stan +simplex[3] p; +... +y ~ mixture(p, normal(m[1], s[1]), + normal(m[2], s[2]), + student_t(4.5, m[3], s[3])); +``` + +Run-time error checking is largely a matter of making sure the simplex +argument is both a simplex and the right size for the number of +components, or if it's a probability, checking that there are two +components and that the value is in (0, 1). + +This functionality is more general than the existing `log_mix` +functionality, which requires actual log density values as arguments. +In this case, the above mixture would be written as + +```stan +simplex[3] p; +... +target += log_mix(p, normal_lpdf(y | m[1], s[1]), + normal_lpdf(y | m[2], s[2]), + student_t_lpdf(y | 4.5, m[3], s[3])); +``` + +The `log_mix` function is slightly different than the `mixture` +distribution and while it could be deprecated in favor of `mixture`, +it would also make sense to keep it. + +In terms of compilation, this is another variadic function with type +checking. + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The implementation requires `mixture_lpmf` and `mixture_lpdf` to be +coded in C++. These will be variadic functions that accept one simplex +argument and a parameter pack of mixture components that must in practice be +the same size as the simplex (only discoverable at runtime). + +Derivatives can either be supplied by autodiff, or they can be done +analytically, where the adjoint rule is just going to mix the adjoints +of the components according to the simplex. + +There are no interactions with other features. + +The boundary conditions are simple. If zero mixture components are +supplied, it should throw an exception. This is better than returning +-infinity and letting the user try to debug the rejection. +If only a single mixture component is supplied, it should provide a +warning that the use can be simplified. + +# Drawbacks +[drawbacks]: #drawbacks + +It's yet one more kind of ad hoc parsing that doesn't fit into the +type system until we have functions. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This design is close to optimal as it matches the way a statistician +might write a mixture in mathematical notation. + +The alternative is to just require users to keep using the log_mix function. + +The impact of inaction here is just that it's harder for users to +write readable and robust mixture models. + +# Prior art +[prior-art]: #prior-art + +Prior art is basically just `log_mix` and some examples of how to code +by hand in the mixtures chapter of the *User's Guide*. + +I am not aware of other probabilistic programming languages that +include this feature. This is way too small of a feature for a paper. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +1. Should we allow function arguments or force user to write `_lpdf` +or `_lpmf` functions? Proposal says YES. + +2. Should we allow one-component mixtures as the boundary condition? + Proposal says YES. + +3. Should `normal(m, s)` be interpreted as a function elsewhere in +Stan. Proposal says NO. + +This proposal is not intended to address infinite mixtures, like a gamma-Poisson. From 11b1ece9db195bc7e2aa09e760ebc65f744b07dc Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Thu, 4 Apr 2024 15:10:34 -0400 Subject: [PATCH 2/4] revised w.r.t. comments --- designs/0034-mixtures.md | 153 +++++++++++++++++++++++---------------- 1 file changed, 90 insertions(+), 63 deletions(-) diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md index 9a3db27..79b5c50 100644 --- a/designs/0034-mixtures.md +++ b/designs/0034-mixtures.md @@ -4,29 +4,33 @@ - Stan Issue: # Summary -[summary]: #summary The mixtures feature is designed to allow users to specify mixture models using a consistent density-based notation without having to -worry about marginalization and working on the log scale. - +worry about marginalizing the discrete parameter manually or about +working on the log scale. # Motivation -[motivation]: #motivation Users writing their own mixture models has been error prone. A specific failure mode is vectorization and mixing over the whole population versus mixing over individuals. The latter is almost -always intended, but the former is often coded. +always intended, but the former is what a naive Stan program will +do. # Guide-level explanation -[guide-level-explanation]: #guide-level-explanation -The syntax follows that of a distribution. For example, a -two-component normal mixture model could be written in canonical form -as follows. +In both the two-component and multi-component mixture syntax, a new +distribution function will be introduced, which is described in this +section. + + +## Two-component mixtures. + +A two-component normal mixture model will be written in canonical form +as follows. ```stan y ~ mixture(p, @@ -44,14 +48,16 @@ target += log_sum_exp(log(p) + normal_lpdf(y | m[1], s[1]), ``` Because this follows distribution syntax, for consistency we would -define +also define the probability functions `mixture_lpdf` and `mixture_lpmf` ```stan mixture_lpdf(y | p, lp1, lp2) =def= log_sum_exp(log(p) + lp1(y), log1m(p) + lp2(y)); ``` -where we have explicitly used application syntax for `lp1` and `lp2`. +Here we have have explicitly used application syntax for `lp1` and +`lp2`, even though it doesn't exist in Stan. For discrete +distributions, we will also need a `mixture_lpmf`. If Stan had C++ style lambdas, a distribution without an outcome variable would be defined as follows. @@ -70,33 +76,19 @@ Stan, so that it would not be legal to write the following. target += normal(m[1], s[1])(y); // ILLEGAL ``` -Unary functions will be allowed as function arguments to `mixture`, so the -following would be legal. +## Require probability function arguments -```stan -functions { - real normal1(real y) { - real m1 = ...; - real s1 = ...; - return normal_lpdf(y | m1, s1); - } - real normal2(real y) { - real m2 = ...; - real s2 = ...; - return normal_lpdf(y | m2, s2); - } -} -... -real p; -... -model { - y ~ mixture(p, normal1, normal2); -} -``` +Although it would be possible to allow arbitrary functions to +participate in mixtures, in the first release we propose to not allow +arbitrary functions and restrict the arguments after the first to be +probability functions (lpdfs or lpmfs). The workaround in this +case is to take the function and wrap it in an lpdf or lpmf +definition. We may add arbitrary function arguments at a later date +after we introduce lambdas into Stan. -The alternative to allowing function arguments would be to force the -user introduce new distribution functions, such as the following, -which defines a zero-inflated Poisson as a mixture model. +For example, to define a zero-inflated Poisson as a mixture model, +the lpmf that puts all of its mass on 0 can be defined and then used +as one of the mixture components. ```stan functions { @@ -111,8 +103,13 @@ model { y ~ mixture(p, zero_delta, poisson(lam)); ``` -The idea is to allow arbitrary numbers of components in the mixture, -with the first argument being a simplex matching in size. For + +## Mixtures with more than two components + +Up until now, we have assumed two mixture components and a probability +argument. In general, we want to allow more than two mixture +components. The proposal is to have the first argument be a syntax +whose size determines the numnber of remaining arguments. For example, the following would be a legal way to define a 3-component mixture. ```stan @@ -128,6 +125,11 @@ argument is both a simplex and the right size for the number of components, or if it's a probability, checking that there are two components and that the value is in (0, 1). +In terms of compilation, `mixture` is another variadic function with type +checking. + +## Comparison to existing `log_mix` function + This functionality is more general than the existing `log_mix` functionality, which requires actual log density values as arguments. In this case, the above mixture would be written as @@ -144,49 +146,76 @@ The `log_mix` function is slightly different than the `mixture` distribution and while it could be deprecated in favor of `mixture`, it would also make sense to keep it. -In terms of compilation, this is another variadic function with type -checking. # Reference-level explanation -[reference-level-explanation]: #reference-level-explanation The implementation requires `mixture_lpmf` and `mixture_lpdf` to be -coded in C++. These will be variadic functions that accept one simplex -argument and a parameter pack of mixture components that must in practice be -the same size as the simplex (only discoverable at runtime). +coded in C++. These will be variadic functions that accept either (a) +one probability argument and two probability functions without +variates, or (b) one simplex, and a number of probability functions +matching its size. + +Derivatives can be handled by autodiff, or an "analytic" derivative +for `mixture_lpmf` and `mixture_lpdf` can be supplied. -Derivatives can either be supplied by autodiff, or they can be done -analytically, where the adjoint rule is just going to mix the adjoints -of the components according to the simplex. +## Interactions with other features There are no interactions with other features. +## Boundary conditions and exceptions + The boundary conditions are simple. If zero mixture components are supplied, it should throw an exception. This is better than returning --infinity and letting the user try to debug the rejection. -If only a single mixture component is supplied, it should provide a -warning that the use can be simplified. +`-infinity` and letting the user try to debug the rejection. +If only a single mixture component is supplied with a size 1 simplex, +a warning should be emitted suggesting simplified usage. + +## Include normalizing constants + +The distributions have to be compiled with `propto=False` and this +needs to be made clear in the documentation. Statistically, +`propto=False` is required to get the correct result when the mixture +components don't have equal normalizing constants. + +## Catching out of bounds exceptions in variates + +Stan's probability functions are designed to throw +`std::invalid_argument` exceptions when they encounter illegal +arguments. In this case, we want to catch any exceptions that come +from the variate being mixed being out of bounds. That is, if +we use `exponential(2)` as one of our mixture components, then +we want a negative argument `exponential_lpdf(-1 | 2)` to return +`-infinity` rather than throw an exception. + +To handle that, we propose to change the exception thrown in the math +library to a subclass of `std::invalid_argument` which we will call +`stan::math::invalid_variate`. This is a simple change in the math +lib for type of exception thrown and will not break backward +compatiblity. # Drawbacks -[drawbacks]: #drawbacks It's yet one more kind of ad hoc parsing that doesn't fit into the type system until we have functions. -# Rationale and alternatives -[rationale-and-alternatives]: #rationale-and-alternatives +## Rationale and alternatives This design is close to optimal as it matches the way a statistician might write a mixture in mathematical notation. -The alternative is to just require users to keep using the log_mix function. +The alternative is to just require users to keep using the `log_mix` +function or writing their own custom code. The impact of inaction here is just that it's harder for users to write readable and robust mixture models. -# Prior art -[prior-art]: #prior-art + +## Limitations + +This proposal is not intended to address infinite mixtures, like a gamma-Poisson. + +## Prior art Prior art is basically just `log_mix` and some examples of how to code by hand in the mixtures chapter of the *User's Guide*. @@ -194,16 +223,14 @@ by hand in the mixtures chapter of the *User's Guide*. I am not aware of other probabilistic programming languages that include this feature. This is way too small of a feature for a paper. -# Unresolved questions -[unresolved-questions]: #unresolved-questions + +# Resolved questions 1. Should we allow function arguments or force user to write `_lpdf` -or `_lpmf` functions? Proposal says YES. +or `_lpmf` functions? No. -2. Should we allow one-component mixtures as the boundary condition? - Proposal says YES. +2. Should we allow one-component mixtures as the boundary condition? Yes. 3. Should `normal(m, s)` be interpreted as a function elsewhere in -Stan. Proposal says NO. +Stan. No. -This proposal is not intended to address infinite mixtures, like a gamma-Poisson. From 708f819e022b2f97ed7ce2e6a6dc105e993d97f1 Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Tue, 22 Apr 2025 16:29:58 -0400 Subject: [PATCH 3/4] addressing review comments --- designs/0034-mixtures.md | 135 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 131 insertions(+), 4 deletions(-) diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md index 79b5c50..bf5ad6a 100644 --- a/designs/0034-mixtures.md +++ b/designs/0034-mixtures.md @@ -104,12 +104,100 @@ model { ``` +## Mixing truncated distributions + +When two truncated distributions are mixed, it is important to include +the normalization constants. + + +### Proposal for native truncation syntax +in the situation where `mu`, `sigma`, and `lambda` are parameters. + +```stan +real alpha; + +alpha ~ mixture(p, + normal(mu, sigma) T[0, ], + exponential(lambda)); +``` + +Using truncation in this way will probably require extra work on +the parser side in order to make `normal(mu, sigma) T[0, ]` a node in +the syntax tree. Stan does *not* currently support statements of the +form + +```stan +target += normal_lupdf(alpha | mu, sigma) T[0, ]; +``` + +Instead, you have to do the following. + +```stan +target += normal_lupdf(alpha | mu, sigma) + - normal_ccdf(0 | mu, sigma); +``` + +### Alternative without new truncation syntax + +We can avoid having to have the `T[0, ]` syntax by instead requiring +the user to define their own function, + +```stan +real normal_lb_lpdf(real mu, real sigma, real lb) { + return normal_lupdf(alpha | mu, sigma) + - normal_ccdf(lb | mu, sigma); +} +``` + +With this, the above would be coded as + +```stan +alpha ~ mixture(p, + normal_lb(mu, sigma, 0), + exponential(lambda)); +``` + +We will probably start without implementing the new truncation +syntax. + + +## Mixing continuous and discrete distributions + +It is incoherent to directly mix continuous and discrete +distributions. That is, we do *not* want to do something like the +following. + +``` +int y; + +y ~ mixture(p, + poisson(lambda1), + exponential(lambda2)); +``` + +The problem here is that there is no type to assign the mixture. In +these cases of mismatched types, we want to raise a compiler error. + +Technically, the continuity can be eliminated by defining a new +discrete distribution that delegates to the exponential by promotion. + +```stan +real exponential_int_lpdf(int y, real lambda) { + return exponential_lpdf(y | lambda); +} +``` + +To make this coherent, the sum of densities of valid `y` needs to be +finite. In this case, the requirement is $\sum_{n \in \mathbb{N}} +\textrm{exponential}(n | \lambda) < \infty.$ This is not something we +can enforce through Stan, but is something we should be documenting. + ## Mixtures with more than two components Up until now, we have assumed two mixture components and a probability argument. In general, we want to allow more than two mixture components. The proposal is to have the first argument be a syntax -whose size determines the numnber of remaining arguments. For +whose size determines the number of remaining arguments. For example, the following would be a legal way to define a 3-component mixture. ```stan @@ -192,7 +280,7 @@ To handle that, we propose to change the exception thrown in the math library to a subclass of `std::invalid_argument` which we will call `stan::math::invalid_variate`. This is a simple change in the math lib for type of exception thrown and will not break backward -compatiblity. +compatibility. # Drawbacks @@ -220,9 +308,48 @@ This proposal is not intended to address infinite mixtures, like a gamma-Poisson Prior art is basically just `log_mix` and some examples of how to code by hand in the mixtures chapter of the *User's Guide*. -I am not aware of other probabilistic programming languages that -include this feature. This is way too small of a feature for a paper. +#### PyMC + +PyMC provides a distribution class +[`pymc.Mixture`](https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.Mixture.html) + Here's the first example from their documentation. + +```python +with pm.Model() as model: + w = pm.Dirichlet("w", a=np.array([1, 1])) # 2 mixture weights + + lam1 = pm.Exponential("lam1", lam=1) + lam2 = pm.Exponential("lam2", lam=1) + + # As we just need the logp, rather than add a RV to the model, we need to call `.dist()` + # These two forms are equivalent, but the second benefits from vectorization + components = [ + pm.Poisson.dist(mu=lam1), + pm.Poisson.dist(mu=lam2), + ] + # `shape=(2,)` indicates 2 mixture components + components = pm.Poisson.dist(mu=pm.math.stack([lam1, lam2]), shape=(2,)) + + like = pm.Mixture("like", w=w, comp_dists=components, + observed=data) + ``` + +This creates a two-component mixture of Poisson distributions. + +#### Turing.jl + +`Turing.jl` provides a class +[`MixtureModel`](https://turinglang.org/docs/tutorials/gaussian-mixture-models/). +Here's an example of its use. + +```julia +w = [0.5, 0.5] +mu = [-3.5, 0.5] +mixturemodel = MixtureModel([MvNormal(Fill(mu_k, 2), I) for mu_k in mu], w) +``` +This creates a two-component mixture of two-dimensional isotropic +Gaussians (`I` is the identity matrix imported from `LinearAlgebra`). # Resolved questions From 2062b7a11e7180b6d8754d0de0044c1d836d8233 Mon Sep 17 00:00:00 2001 From: Bob Carpenter Date: Tue, 22 Apr 2025 16:36:54 -0400 Subject: [PATCH 4/4] PyMC, Turing examples --- designs/0034-mixtures.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md index bf5ad6a..898d705 100644 --- a/designs/0034-mixtures.md +++ b/designs/0034-mixtures.md @@ -143,7 +143,7 @@ We can avoid having to have the `T[0, ]` syntax by instead requiring the user to define their own function, ```stan -real normal_lb_lpdf(real mu, real sigma, real lb) { +real normal_lb_lpdf(real alpha, real mu, real sigma, real lb) { return normal_lupdf(alpha | mu, sigma) - normal_ccdf(lb | mu, sigma); } @@ -167,7 +167,7 @@ It is incoherent to directly mix continuous and discrete distributions. That is, we do *not* want to do something like the following. -``` +```stan int y; y ~ mixture(p, @@ -189,8 +189,8 @@ real exponential_int_lpdf(int y, real lambda) { To make this coherent, the sum of densities of valid `y` needs to be finite. In this case, the requirement is $\sum_{n \in \mathbb{N}} -\textrm{exponential}(n | \lambda) < \infty.$ This is not something we -can enforce through Stan, but is something we should be documenting. +\textrm{exponential}(n | \lambda) < \infty.$ This is not something +that can be enforced through Stan, but it should be documented. ## Mixtures with more than two components @@ -234,6 +234,10 @@ The `log_mix` function is slightly different than the `mixture` distribution and while it could be deprecated in favor of `mixture`, it would also make sense to keep it. +Like this proposal, the `log_mix` function requires truncated +distributions or mixtures of discrete and continuous distributions to +be finessed through writing a user-defined function of the appropriate +type ending in `_lpdf` or `_lpmf`. # Reference-level explanation @@ -332,7 +336,7 @@ with pm.Model() as model: like = pm.Mixture("like", w=w, comp_dists=components, observed=data) - ``` +``` This creates a two-component mixture of Poisson distributions.