From 68b6fcf38bb727f43aedec3ad591b29eea91ae36 Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Wed, 14 Feb 2024 15:29:22 -0500
Subject: [PATCH 1/4] mixtures proposal

---
 designs/0034-mixtures.md | 209 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100644 designs/0034-mixtures.md

diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md
new file mode 100644
index 0000000..9a3db27
--- /dev/null
+++ b/designs/0034-mixtures.md
@@ -0,0 +1,209 @@
+- Feature Name: 0034-mixtures
+- Start Date: 2024-02-14
+- RFC PR:
+- Stan Issue:
+
+# Summary
+[summary]: #summary
+
+The mixtures feature is designed to allow users to specify mixture
+models using a consistent density-based notation without having to
+worry about marginalization and working on the log scale.
+
+
+
+# Motivation
+[motivation]: #motivation
+
+Users writing their own mixture models has been error prone.  A
+specific failure mode is vectorization and mixing over the whole
+population versus mixing over individuals.  The latter is almost
+always intended, but the former is often coded.
+
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+The syntax follows that of a distribution.  For example, a
+two-component normal mixture model could be written in canonical form
+as follows.  
+
+```stan
+y ~ mixture(p,
+            normal(m[1], s[1]),
+            normal(m[2], s[2]));
+```
+
+Here, `p` is a probability in (0, 1) and `normal(m[1], s[1])` acts
+like a lambda binding of the first argument (more on that below).
+This notation would be syntactic sugar for the following expression
+
+```stan
+target += log_sum_exp(log(p) + normal_lpdf(y | m[1], s[1]),
+                      log1m(p) + normal_lpdf(y | m[2], s[2]));
+```
+
+Because this follows distribution syntax, for consistency we would
+define
+```stan
+mixture_lpdf(y | p, lp1, lp2)
+=def=
+log_sum_exp(log(p) + lp1(y), log1m(p) + lp2(y));
+```
+
+where we have explicitly used application syntax for `lp1` and `lp2`.
+
+If Stan had C++ style lambdas, a distribution without an outcome
+variable would be defined as follows.
+
+```stan
+normal_lpdf(m[1], s[1])
+=def=
+[y].normal_lpdf(y | m[1], s[1])
+```
+
+Even though a distribution without an outcome behaves like a lambda in
+the context of mixtures, this will *not* be generally available in
+Stan, so that it would not be legal to write the following.
+
+```stan
+target += normal(m[1], s[1])(y);  // ILLEGAL
+```
+
+Unary functions will be allowed as function arguments to `mixture`, so the
+following would be legal.
+
+```stan
+functions {
+  real normal1(real y) {
+    real m1 = ...; 
+    real s1 = ...; 
+    return normal_lpdf(y | m1, s1); 
+  }
+  real normal2(real y) {
+    real m2 = ...; 
+    real s2 = ...; 
+    return normal_lpdf(y | m2, s2); 
+  }
+}
+... 
+real<lower=0, upper=1> p; 
+...
+model {
+  y ~ mixture(p, normal1, normal2);
+}
+```
+
+The alternative to allowing function arguments would be to force the
+user introduce new distribution functions, such as the following,
+which defines a zero-inflated Poisson as a mixture model.
+
+```stan
+functions {
+  real zero_delta_lpmf(int y) {
+    return y == 0 ? 0 : -infinity();
+  }
+}
+...
+real<lower=0, upper=1> p;
+...
+model {
+  y ~ mixture(p, zero_delta, poisson(lam));
+```
+
+The idea is to allow arbitrary numbers of components in the mixture,
+with the first argument being a simplex matching in size.  For
+example, the following would be a legal way to define a 3-component mixture.
+
+```stan
+simplex[3] p;
+...
+y ~ mixture(p, normal(m[1], s[1]),
+               normal(m[2], s[2]),
+               student_t(4.5, m[3], s[3]));
+```
+
+Run-time error checking is largely a matter of making sure the simplex
+argument is both a simplex and the right size for the number of
+components, or if it's a probability, checking that there are two
+components and that the value is in (0, 1).
+
+This functionality is more general than the existing `log_mix`
+functionality, which requires actual log density values as arguments.
+In this case, the above mixture would be written as
+
+```stan
+simplex[3] p;
+...
+target += log_mix(p, normal_lpdf(y | m[1], s[1]),
+                     normal_lpdf(y | m[2], s[2]), 
+                     student_t_lpdf(y | 4.5, m[3], s[3]));
+```
+
+The `log_mix` function is slightly different than the `mixture`
+distribution and while it could be deprecated in favor of `mixture`,
+it would also make sense to keep it.
+
+In terms of compilation, this is another variadic function with type
+checking. 
+
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+The implementation requires `mixture_lpmf` and `mixture_lpdf` to be
+coded in C++.  These will be variadic functions that accept one simplex
+argument and a parameter pack of mixture components that must in practice be
+the same size as the simplex (only discoverable at runtime).
+
+Derivatives can either be supplied by autodiff, or they can be done
+analytically, where the adjoint rule is just going to mix the adjoints
+of the components according to the simplex.
+
+There are no interactions with other features.
+
+The boundary conditions are simple.  If zero mixture components are
+supplied, it should throw an exception.  This is better than returning
+-infinity and letting the user try to debug the rejection.  
+If only a single mixture component is supplied, it should provide a
+warning that the use can be simplified.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+It's yet one more kind of ad hoc parsing that doesn't fit into the
+type system until we have functions.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+This design is close to optimal as it matches the way a statistician
+might write a mixture in mathematical notation.
+
+The alternative is to just require users to keep using the log_mix function.
+
+The impact of inaction here is just that it's harder for users to
+write readable and robust mixture models.
+
+# Prior art
+[prior-art]: #prior-art
+
+Prior art is basically just `log_mix` and some examples of how to code
+by hand in the mixtures chapter of the *User's Guide*.
+
+I am not aware of other probabilistic programming languages that
+include this feature.  This is way too small of a feature for a paper.
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+1.  Should we allow function arguments or force user to write `_lpdf`
+or `_lpmf` functions?  Proposal says YES.
+
+2.  Should we allow one-component mixtures as the boundary condition?
+    Proposal says YES.
+
+3.  Should `normal(m, s)` be interpreted as a function elsewhere in
+Stan.  Proposal says NO.
+
+This proposal is not intended to address infinite mixtures, like a gamma-Poisson.

From 11b1ece9db195bc7e2aa09e760ebc65f744b07dc Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Thu, 4 Apr 2024 15:10:34 -0400
Subject: [PATCH 2/4] revised w.r.t. comments

---
 designs/0034-mixtures.md | 153 +++++++++++++++++++++++----------------
 1 file changed, 90 insertions(+), 63 deletions(-)

diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md
index 9a3db27..79b5c50 100644
--- a/designs/0034-mixtures.md
+++ b/designs/0034-mixtures.md
@@ -4,29 +4,33 @@
 - Stan Issue:
 
 # Summary
-[summary]: #summary
 
 The mixtures feature is designed to allow users to specify mixture
 models using a consistent density-based notation without having to
-worry about marginalization and working on the log scale.
-
+worry about marginalizing the discrete parameter manually or about
+working on the log scale.
 
 
 # Motivation
-[motivation]: #motivation
 
 Users writing their own mixture models has been error prone.  A
 specific failure mode is vectorization and mixing over the whole
 population versus mixing over individuals.  The latter is almost
-always intended, but the former is often coded.
+always intended, but the former is what a naive Stan program will
+do.
 
 
 # Guide-level explanation
-[guide-level-explanation]: #guide-level-explanation
 
-The syntax follows that of a distribution.  For example, a
-two-component normal mixture model could be written in canonical form
-as follows.  
+In both the two-component and multi-component mixture syntax, a new
+distribution function will be introduced, which is described in this
+section.  
+
+
+## Two-component mixtures.
+
+A two-component normal mixture model will be written in canonical form
+as follows.
 
 ```stan
 y ~ mixture(p,
@@ -44,14 +48,16 @@ target += log_sum_exp(log(p) + normal_lpdf(y | m[1], s[1]),
 ```
 
 Because this follows distribution syntax, for consistency we would
-define
+also define the probability functions `mixture_lpdf` and `mixture_lpmf`
 ```stan
 mixture_lpdf(y | p, lp1, lp2)
 =def=
 log_sum_exp(log(p) + lp1(y), log1m(p) + lp2(y));
 ```
 
-where we have explicitly used application syntax for `lp1` and `lp2`.
+Here we have have explicitly used application syntax for `lp1` and
+`lp2`, even though it doesn't exist in Stan.  For discrete
+distributions, we will also need a `mixture_lpmf`.
 
 If Stan had C++ style lambdas, a distribution without an outcome
 variable would be defined as follows.
@@ -70,33 +76,19 @@ Stan, so that it would not be legal to write the following.
 target += normal(m[1], s[1])(y);  // ILLEGAL
 ```
 
-Unary functions will be allowed as function arguments to `mixture`, so the
-following would be legal.
+## Require probability function arguments
 
-```stan
-functions {
-  real normal1(real y) {
-    real m1 = ...; 
-    real s1 = ...; 
-    return normal_lpdf(y | m1, s1); 
-  }
-  real normal2(real y) {
-    real m2 = ...; 
-    real s2 = ...; 
-    return normal_lpdf(y | m2, s2); 
-  }
-}
-... 
-real<lower=0, upper=1> p; 
-...
-model {
-  y ~ mixture(p, normal1, normal2);
-}
-```
+Although it would be possible to allow arbitrary functions to
+participate in mixtures, in the first release we propose to not allow
+arbitrary functions and restrict the arguments after the first to be
+probability functions (lpdfs or lpmfs).  The workaround in this
+case is to take the function and wrap it in an lpdf or lpmf
+definition.  We may add arbitrary function arguments at a later date
+after we introduce lambdas into Stan.
 
-The alternative to allowing function arguments would be to force the
-user introduce new distribution functions, such as the following,
-which defines a zero-inflated Poisson as a mixture model.
+For example, to define a zero-inflated Poisson as a mixture model,
+the lpmf that puts all of its mass on 0 can be defined and then used
+as one of the mixture components.
 
 ```stan
 functions {
@@ -111,8 +103,13 @@ model {
   y ~ mixture(p, zero_delta, poisson(lam));
 ```
 
-The idea is to allow arbitrary numbers of components in the mixture,
-with the first argument being a simplex matching in size.  For
+
+## Mixtures with more than two components
+
+Up until now, we have assumed two mixture components and a probability
+argument.  In general, we want to allow more than two mixture
+components.  The proposal is to have the first argument be a syntax
+whose  size determines the numnber of remaining arguments.  For
 example, the following would be a legal way to define a 3-component mixture.
 
 ```stan
@@ -128,6 +125,11 @@ argument is both a simplex and the right size for the number of
 components, or if it's a probability, checking that there are two
 components and that the value is in (0, 1).
 
+In terms of compilation, `mixture` is another variadic function with type
+checking. 
+
+## Comparison to existing `log_mix` function
+
 This functionality is more general than the existing `log_mix`
 functionality, which requires actual log density values as arguments.
 In this case, the above mixture would be written as
@@ -144,49 +146,76 @@ The `log_mix` function is slightly different than the `mixture`
 distribution and while it could be deprecated in favor of `mixture`,
 it would also make sense to keep it.
 
-In terms of compilation, this is another variadic function with type
-checking. 
 
 
 # Reference-level explanation
-[reference-level-explanation]: #reference-level-explanation
 
 The implementation requires `mixture_lpmf` and `mixture_lpdf` to be
-coded in C++.  These will be variadic functions that accept one simplex
-argument and a parameter pack of mixture components that must in practice be
-the same size as the simplex (only discoverable at runtime).
+coded in C++.  These will be variadic functions that accept either (a)
+one probability argument and two probability functions without
+variates, or (b) one simplex, and a number of probability functions
+matching its size.
+
+Derivatives can be handled by autodiff, or an "analytic" derivative
+for `mixture_lpmf` and `mixture_lpdf` can be supplied.
 
-Derivatives can either be supplied by autodiff, or they can be done
-analytically, where the adjoint rule is just going to mix the adjoints
-of the components according to the simplex.
+## Interactions with other features
 
 There are no interactions with other features.
 
+## Boundary conditions and exceptions
+
 The boundary conditions are simple.  If zero mixture components are
 supplied, it should throw an exception.  This is better than returning
--infinity and letting the user try to debug the rejection.  
-If only a single mixture component is supplied, it should provide a
-warning that the use can be simplified.
+`-infinity` and letting the user try to debug the rejection. 
+If only a single mixture component is supplied with a size 1 simplex,
+a warning should be emitted suggesting simplified usage.
+
+## Include normalizing constants
+
+The distributions have to be compiled with `propto=False` and this
+needs to be made clear in the documentation.   Statistically,
+`propto=False` is required to get the correct result when the mixture
+components don't have equal normalizing constants.
+
+## Catching out of bounds exceptions in variates
+
+Stan's probability functions are designed to throw
+`std::invalid_argument` exceptions when they encounter illegal
+arguments.  In this case, we want to catch any exceptions that come
+from the variate being mixed being out of bounds.  That is, if
+we use `exponential(2)` as one of our mixture components, then
+we want a negative argument `exponential_lpdf(-1 | 2)` to return
+`-infinity` rather than throw an exception.
+
+To handle that, we propose to change the exception thrown in the math
+library to a subclass of `std::invalid_argument` which we will call
+`stan::math::invalid_variate`.  This is a simple change in the math
+lib for type of exception thrown and will not break backward
+compatiblity. 
 
 # Drawbacks
-[drawbacks]: #drawbacks
 
 It's yet one more kind of ad hoc parsing that doesn't fit into the
 type system until we have functions.
 
-# Rationale and alternatives
-[rationale-and-alternatives]: #rationale-and-alternatives
+## Rationale and alternatives
 
 This design is close to optimal as it matches the way a statistician
 might write a mixture in mathematical notation.
 
-The alternative is to just require users to keep using the log_mix function.
+The alternative is to just require users to keep using the `log_mix`
+function or writing their own custom code.
 
 The impact of inaction here is just that it's harder for users to
 write readable and robust mixture models.
 
-# Prior art
-[prior-art]: #prior-art
+
+## Limitations
+
+This proposal is not intended to address infinite mixtures, like a gamma-Poisson.
+
+## Prior art
 
 Prior art is basically just `log_mix` and some examples of how to code
 by hand in the mixtures chapter of the *User's Guide*.
@@ -194,16 +223,14 @@ by hand in the mixtures chapter of the *User's Guide*.
 I am not aware of other probabilistic programming languages that
 include this feature.  This is way too small of a feature for a paper.
 
-# Unresolved questions
-[unresolved-questions]: #unresolved-questions
+
+# Resolved questions
 
 1.  Should we allow function arguments or force user to write `_lpdf`
-or `_lpmf` functions?  Proposal says YES.
+or `_lpmf` functions?  No.
 
-2.  Should we allow one-component mixtures as the boundary condition?
-    Proposal says YES.
+2.  Should we allow one-component mixtures as the boundary condition?  Yes.
 
 3.  Should `normal(m, s)` be interpreted as a function elsewhere in
-Stan.  Proposal says NO.
+Stan.  No.
 
-This proposal is not intended to address infinite mixtures, like a gamma-Poisson.

From 708f819e022b2f97ed7ce2e6a6dc105e993d97f1 Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Tue, 22 Apr 2025 16:29:58 -0400
Subject: [PATCH 3/4] addressing review comments

---
 designs/0034-mixtures.md | 135 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 131 insertions(+), 4 deletions(-)

diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md
index 79b5c50..bf5ad6a 100644
--- a/designs/0034-mixtures.md
+++ b/designs/0034-mixtures.md
@@ -104,12 +104,100 @@ model {
 ```
 
 
+## Mixing truncated distributions
+
+When two truncated distributions are mixed, it is important to include
+the normalization constants.
+
+
+### Proposal for native truncation syntax
+in the situation where `mu`, `sigma`, and `lambda` are parameters.
+
+```stan
+real<lower=0> alpha;
+
+alpha ~ mixture(p,
+                normal(mu, sigma) T[0, ],
+                exponential(lambda));
+```
+
+Using truncation in this way will probably require extra work on
+the parser side in order to make `normal(mu, sigma) T[0, ]` a node in
+the syntax tree.  Stan does *not* currently support statements of the
+form
+
+```stan 
+target += normal_lupdf(alpha | mu, sigma) T[0, ]; 
+```
+
+Instead, you have to do the following.
+
+```stan 
+target += normal_lupdf(alpha | mu, sigma) 
+          - normal_ccdf(0 | mu, sigma);
+```
+
+### Alternative without new truncation syntax
+
+We can avoid having to have the `T[0, ]` syntax by instead requiring
+the user to define their own function,
+
+```stan
+real normal_lb_lpdf(real mu, real sigma, real lb) {
+  return normal_lupdf(alpha | mu, sigma) 
+          - normal_ccdf(lb | mu, sigma);
+}
+```
+
+With this, the above would be coded as
+
+```stan
+alpha ~ mixture(p,
+                normal_lb(mu, sigma, 0),
+                exponential(lambda));
+```
+
+We will probably start without implementing the new truncation
+syntax. 
+
+
+## Mixing continuous and discrete distributions
+
+It is incoherent to directly mix continuous and discrete
+distributions.  That is, we do *not* want to do something like the
+following.
+
+```
+int<lower=0> y;
+
+y ~ mixture(p,
+            poisson(lambda1),
+            exponential(lambda2));
+```
+
+The problem here is that there is no type to assign the mixture.  In
+these cases of mismatched types, we want to raise a compiler error.
+
+Technically, the continuity can be eliminated by defining a new
+discrete distribution that delegates to the exponential by promotion.
+
+```stan
+real exponential_int_lpdf(int y, real lambda) {
+  return exponential_lpdf(y | lambda);
+}
+```
+
+To make this coherent, the sum of densities of valid `y` needs to be
+finite. In this case, the requirement is $\sum_{n \in \mathbb{N}}
+\textrm{exponential}(n | \lambda) < \infty.$  This is not something we
+can enforce through Stan, but is something we should be documenting.
+
 ## Mixtures with more than two components
 
 Up until now, we have assumed two mixture components and a probability
 argument.  In general, we want to allow more than two mixture
 components.  The proposal is to have the first argument be a syntax
-whose  size determines the numnber of remaining arguments.  For
+whose  size determines the number of remaining arguments.  For
 example, the following would be a legal way to define a 3-component mixture.
 
 ```stan
@@ -192,7 +280,7 @@ To handle that, we propose to change the exception thrown in the math
 library to a subclass of `std::invalid_argument` which we will call
 `stan::math::invalid_variate`.  This is a simple change in the math
 lib for type of exception thrown and will not break backward
-compatiblity. 
+compatibility. 
 
 # Drawbacks
 
@@ -220,9 +308,48 @@ This proposal is not intended to address infinite mixtures, like a gamma-Poisson
 Prior art is basically just `log_mix` and some examples of how to code
 by hand in the mixtures chapter of the *User's Guide*.
 
-I am not aware of other probabilistic programming languages that
-include this feature.  This is way too small of a feature for a paper.
+#### PyMC
+
+PyMC provides a distribution class
+[`pymc.Mixture`](https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.Mixture.html)
+ Here's the first example from their documentation.
+
+```python
+with pm.Model() as model:
+    w = pm.Dirichlet("w", a=np.array([1, 1]))  # 2 mixture weights
+
+    lam1 = pm.Exponential("lam1", lam=1)
+    lam2 = pm.Exponential("lam2", lam=1)
+
+    # As we just need the logp, rather than add a RV to the model, we need to call `.dist()`
+    # These two forms are equivalent, but the second benefits from vectorization
+    components = [
+        pm.Poisson.dist(mu=lam1),
+        pm.Poisson.dist(mu=lam2),
+    ]
+    # `shape=(2,)` indicates 2 mixture components
+    components = pm.Poisson.dist(mu=pm.math.stack([lam1, lam2]), shape=(2,))
+
+    like = pm.Mixture("like", w=w, comp_dists=components,
+    observed=data)
+	```
+
+This creates a two-component mixture of Poisson distributions.
+
+#### Turing.jl
+
+`Turing.jl` provides a class
+[`MixtureModel`](https://turinglang.org/docs/tutorials/gaussian-mixture-models/).
+Here's an example of its use.
+
+```julia
+w = [0.5, 0.5]
+mu = [-3.5, 0.5]
+mixturemodel = MixtureModel([MvNormal(Fill(mu_k, 2), I) for mu_k in mu], w)
+```
 
+This creates a two-component mixture of two-dimensional isotropic
+Gaussians (`I` is the identity matrix imported from `LinearAlgebra`).
 
 # Resolved questions
 

From 2062b7a11e7180b6d8754d0de0044c1d836d8233 Mon Sep 17 00:00:00 2001
From: Bob Carpenter <bcarpenter@flatironinstitute.org>
Date: Tue, 22 Apr 2025 16:36:54 -0400
Subject: [PATCH 4/4] PyMC, Turing examples

---
 designs/0034-mixtures.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/designs/0034-mixtures.md b/designs/0034-mixtures.md
index bf5ad6a..898d705 100644
--- a/designs/0034-mixtures.md
+++ b/designs/0034-mixtures.md
@@ -143,7 +143,7 @@ We can avoid having to have the `T[0, ]` syntax by instead requiring
 the user to define their own function,
 
 ```stan
-real normal_lb_lpdf(real mu, real sigma, real lb) {
+real normal_lb_lpdf(real alpha, real mu, real sigma, real lb) {
   return normal_lupdf(alpha | mu, sigma) 
           - normal_ccdf(lb | mu, sigma);
 }
@@ -167,7 +167,7 @@ It is incoherent to directly mix continuous and discrete
 distributions.  That is, we do *not* want to do something like the
 following.
 
-```
+```stan
 int<lower=0> y;
 
 y ~ mixture(p,
@@ -189,8 +189,8 @@ real exponential_int_lpdf(int y, real lambda) {
 
 To make this coherent, the sum of densities of valid `y` needs to be
 finite. In this case, the requirement is $\sum_{n \in \mathbb{N}}
-\textrm{exponential}(n | \lambda) < \infty.$  This is not something we
-can enforce through Stan, but is something we should be documenting.
+\textrm{exponential}(n | \lambda) < \infty.$  This is not something
+that can be enforced through Stan, but it should be documented.
 
 ## Mixtures with more than two components
 
@@ -234,6 +234,10 @@ The `log_mix` function is slightly different than the `mixture`
 distribution and while it could be deprecated in favor of `mixture`,
 it would also make sense to keep it.
 
+Like this proposal, the `log_mix` function requires truncated
+distributions or mixtures of discrete and continuous distributions to
+be finessed through writing a user-defined function of the appropriate
+type ending in `_lpdf` or `_lpmf`.
 
 
 # Reference-level explanation
@@ -332,7 +336,7 @@ with pm.Model() as model:
 
     like = pm.Mixture("like", w=w, comp_dists=components,
     observed=data)
-	```
+```
 
 This creates a two-component mixture of Poisson distributions.