Mixture random cleanup #3364

lucianopaz · 2019-01-31T15:04:44Z

In PR #3293, I had tried to fix the Mixture.random method (issue #3270). I had mistakenly not added WIP to the PR and it was merged with the tests passing but with some rough edges. This PR addresses one important thing that was poorly implemented in PR #3293: vectorization of comp_dists.random.

The most important changes are:

Compact comp_dists shape inference method.
_comp_samples method was rewritten to get a uniform interface with distributions.distribution.generate_samples.
The size, dist_shape and broadcast_shape are handled more clearly
I fixed a defect in Multinomial.random, which may be the cause of sample_posterior_predictive returns the wrong dimension #3343

Points that I would like to discuss:

I changed a particular test that was failing, test_sampling.py::TestSamplePPC::test_normal_scalar. The strange thing was that it was also failing for me on master. I made a slight edit to it and would like your opinion.
The generator decorator function around comp_dist.random to provide the raw size instead of the size that is combined with dist_shape and broadcast_shape that generate_samples gives to the generators required me to ignore a pylint warning. Could you suggest an alternative that does not have to do that or something less hackish regarding the size parameter?
Should we add a sample_prior and sample_posterior_predictive test for NormalMixture too?

This PR does not address another point that was lacking from #3293, which is that the LKJCholeskyCorr.random function is not tested. I will work on it and submit a separate PR.

…g. Still need to work around size issues.

…ky covariance matrix).

twiecki · 2019-01-31T17:29:41Z

pymc3/distributions/multivariate.py

@@ -267,7 +267,7 @@ def random(self, point=None, size=None):
            else:
                std_norm_shape = mu.shape
            standard_normal = np.random.standard_normal(std_norm_shape)
-            return mu + np.tensordot(standard_normal, chol, axes=[[-1], [-1]])
+            return mu + np.einsum('...ij,...j->...i', chol, standard_normal)


Thanks! I had not understood tensordot's broadcasting rules well and I noticed it started giving weird shaped outputs, and einsum was the most intuitive way to get the right broadcasting.

junpenglao · 2019-02-01T11:51:07Z

pymc3/distributions/mixture.py

+            self._broadcast_shape = self._comp_dist_shapes
+            self.is_multidim_comp = True
+        elif isinstance(comp_dists, Iterable):
+            if not all((isinstance(comp_dist, Distribution)


I think this check should be move to the __init__, maybe it could also save a few lines in the if/else here.

Ok, I can do that. I thought that it would be a good check to make whenever comp_dists' value was set. I know that we only really change comp_dists during __init__, but I thought that maybe some users could potentially change its values in derived classes or things like that.

junpenglao · 2019-02-01T11:53:42Z

pymc3/distributions/mixture.py

+            samples = []
+            for dist_shape, comp_dist in zip(comp_dist_shapes,
+                                             self.comp_dists):
+                def generator(*args, **kwargs):


You are not joking that this is a horrible hack... I think it might be better to overwrite the comp_dist.random method (it's at least more explicit)

You mean just changing the comp_dists[i].random method somewhere else? Maybe I could add a _generators attribute during comp_dists.setter, with a wrapper of comp_dists[i].random that takes a _raw_shape parameter too. I'll try it out.

junpenglao · 2019-02-01T11:56:56Z

pymc3/distributions/mixture.py

        if samples.shape[-1] == 1:
            return samples[..., 0]
        else:
            return samples

+    def infer_comp_dist_shapes(self, point=None):


Maybe it is better to call this once in __init__ and save the outputs as properties for the mixture class? What do you think?

The problem is that the comp_dist shapes may depend on the values passed with the point dictionary. It's the case for sample_posterior_predictive at least.

I see, good point.

…thing less hackish. At least we dont ignore lint anymore.

lucianopaz · 2019-02-02T09:01:16Z

pymc3/distributions/mixture.py

+            )
+            # In the logp we assume the last axis holds the mixture components
+            # so we move the axis to the last dimension
+            samples = np.moveaxis(samples, 0, -1)
        if samples.shape[-1] == 1:


This if statement is legacy code that I don't really understand. Why should we do this test, wouldn't it be done by Mixture.random?

I dont remember as well, but at some point there is an error with shape = (100, 1) and shape = (100, ) - not even sure we have a test for that.

So error happens when we try to call random from the mixture and comp_dists has shape (100, 1) but Mixture has shape (100,)? I'm not sure how the error goes. Did it happen when comp_dists was a multidimensional distribution or when it was a list of distributions? I'd appreciate any thoughts on extra tests to write.

…and median to stack on the last axis (which should allow Mixture to work with multidimensional distributions). Changed docstring a bit. Minor change in _comp_samples to enfore dtype.

lucianopaz · 2019-02-05T15:09:25Z

I had forgotten to mention another change this PR implements, that turns out to be big: the Mixture's default dtype depends on theano.config.floatX. This means that if the Mixture is continuous the default dtype will be floatX and if it is discrete it will be intX (float64->int32, float32->int16, float16->int8). This made some initial tests fail.

I now pushed a small set of changes that should allow Mixture to handle comp_dists to be a list of multidimensional distributions. I'll try to push some tests for this functionality soon.

lucianopaz · 2019-02-12T13:40:55Z

I went down the road of adding support for multidimensional mixtures. I've run some local tests but still need to write down a more systematic testsuite for this functionality.

junpenglao · 2019-02-12T15:24:07Z

pymc3/tests/test_mixture.py

@@ -75,27 +87,98 @@ def test_normal_mixture(self):
                        np.sort(self.norm_mu),
                        rtol=0.1, atol=0.1)

-    def test_normal_mixture_nd(self):
-        nd, ncomp = 3, 5
+    @pytest.mark.parametrize('nd,ncomp',


Oh this is nice!

junpenglao · 2019-02-12T15:28:34Z

pymc3/distributions/mixture.py

+                                                          size=None)
+                    comp_dist_shapes = test_sample.shape
+            broadcast_shape = comp_dist_shapes
+            includes_mixture = True


I am not complete sure what is the function of includes_mixture here - I imagine is something to do with making sure the broadcasting of w is correct. Could you add some small docstrings to this function?

You're right. Yes, I'll add a docstring. includes_mixture maybe was not the best choice of name, and maybe is redundant. It's a bool that indicates whether the comp_dists last axis has the mixture components or not, but that's the same as testing if comp_dists is a distribution or a list. Hmm, I'll see how not to depend on it.

junpenglao · 2019-02-12T15:30:24Z

pymc3/distributions/mixture.py

+        if isinstance(comp_dists, Distribution):
+            self._comp_dist_shapes = to_tuple(comp_dists.shape)
+            self._broadcast_shape = self._comp_dist_shapes
+            self.is_multidim_comp = True


The name is a bit confusion - this is only to distinguish between the comp being a list or a distribution right?

You're totally right. I'll change this

junpenglao · 2019-02-12T15:34:34Z

pymc3/distributions/mixture.py

+                    # Happens in __init__ when computing self.logp(comp_modes)
+                    self_shape = None
+                comp_shape = tuple(comp_dists.shape)
+                if (


There is no else clause here? I think you should raise an exception

Oops! You're right! Thanks for catching that. I'll add an else clause, or restructure the conditions a bit.

lucianopaz · 2019-02-13T10:19:48Z

While working on the multidimensional distributions support I came across something that I think is wrong in Mixture.random and wanted to ask your opinion on it, @junpenglao.

Consider the following example

import numpy as np
import pymc3 as pm


npop = 5
nd = 3
with pm.Model():
    mus = np.tile(np.arange(npop)[None, :], (nd, 1))
    m1 = pm.NormalMixture('m1',
                          w=np.ones(npop) / npop,
                          mu=mus,
                          sigma=1e-5,
                          comp_shape=(nd, npop),
                          shape=nd)
    m2 = pm.NormalMixture('m2',
                          w=np.ones((nd, npop)) / npop,
                          mu=mus,
                          sigma=1e-5,
                          comp_shape=(nd, npop),
                          shape=nd)

m1_val = m1.random(size=100)
m2_val = m2.random(size=100)

print('m1 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))
print('m2 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1))))

Both m1_val and m2_val have the correct shape (100, nd). My concern is, should we expect each of the elements of m1_val to come from a different mixture component, or should all the elements in a row come from the same distribution? For m2_val I would expect each element to be sampled from independent mixture components because weights w are specified independently for each nd element in the mixture. However, for m1_val, I think that this is not correct and all the elements in each row should come from the same mixture component.

I think that m1 and m2 should match to the following models with the latent component variable z1 and z2:

import numpy as np
from theano import tensor as tt
import pymc3 as pm


npop = 5
nd = 3
with pm.Model():
    mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
    z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
    m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
    z2 = pm.Categorical('z2', p=np.ones(npop) / npop, shape=nd)
    mu2 = tt.as_tensor_variable([mus[i, z2[i]] for i in range(nd)])
    m2 = pm.Normal('m2', mu=mu2, sigma=1e-5, shape=nd)

m1_val = m1.random(size=100)
m2_val = m2.random(size=100)

print('m1 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))
print('m2 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1))))

…stributions.

junpenglao · 2019-02-13T11:50:21Z

I see what you mean, but in each component (regardless of the dimension) it should always slice through all the dimension, which means z2 = pm.Categorical('z2', p=np.ones(npop) / npop, shape=nd) does not makes sense because you want to select the component while keeping each dimension (ie not sampling the dimension)

For example:

pm.Categorical.dist(p=np.ones(npop) / npop, shape=nd).random()
# ==> array([0, 2, 4])

means in one realization, you select component 1 dimension 1 as dimension 1, component 3 dimension 2 as dimension 2, component 5 dimension 3 as dimension 3. Which is a different model than what i would expect as a mixture model that each slice of the data (a 3 dimension vector in this case) are randomly selected from one component of the mixture.

lucianopaz · 2019-02-13T12:09:14Z

So it is safe to assume that w should always be a vector with npop elements, right? If w were a matrix, it would mean that the mixture component weights could change across each dimension of the distribution, so the mixture component wouldn't slice through the other dimensions.

lucianopaz · 2019-02-13T12:17:23Z

@junpenglao, test_mixture_random_shape, actually supplies a 2D array as the Mixture's w. I'm starting to think again that it does make sense for each dimension of the mixture to come from a different components in these cases, because the mixture weights could be different for each of the mixture's dimension elements.

…tion_samples.

lucianopaz · 2019-02-13T12:22:58Z

With the latest commits, the below code snippets match the behavior for the mixture and latent label models

import numpy as np
from theano import tensor as tt
import pymc3 as pm


npop = 5
nd = 3
with pm.Model():
    mus = np.tile(np.arange(npop)[None, :], (nd, 1))
    m1 = pm.NormalMixture('m1',
                          w=np.ones(npop) / npop,
                          mu=mus,
                          sigma=1e-5,
                          comp_shape=(nd, npop),
                          shape=nd)
    m2 = pm.NormalMixture('m2',
                          w=np.ones((nd, npop)) / npop,
                          mu=mus,
                          sigma=1e-5,
                          comp_shape=(nd, npop),
                          shape=nd)

m1_val = m1.random(size=100)
m2_val = m2.random(size=100)

print('m1 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))  # True
print('m2 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1))))  # False

with pm.Model():
    mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
    z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
    m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
    z2 = pm.Categorical('z2', p=np.ones(npop) / npop, shape=nd)
    mu2 = tt.as_tensor_variable([mus[i, z2[i]] for i in range(nd)])
    m2 = pm.Normal('m2', mu=mu2, sigma=1e-5, shape=nd)

m1_val = m1.random(size=100)
m2_val = m2.random(size=100)

print('m1 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))  # True
print('m2 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1))))  # False

junpenglao · 2019-02-13T13:48:41Z

@junpenglao, test_mixture_random_shape, actually supplies a 2D array as the Mixture's w. I'm starting to think again that it does make sense for each dimension of the mixture to come from a different components in these cases, because the mixture weights could be different for each of the mixture's dimension elements.

That's for a different use case, and it only makes sense for repeated measurement. So it would be cases like: 2 different 3 component mixture stack together. I think these cases are subtitle and potentially easy to be missed use, not sure what would be the best way to handle these (Oh god i so much wish we have the concept of batch shape and sample shape so we can distinguish this explicitly).

So to reason about the 2D w, think of this usecase:
1 mixture distribution with 3 Components: each is a 5D Gaussian. When we generate from this mixture we generate first label shape = (n, 1) or (n, ), which gives us something like array([0, 1, 1, 2, ...]), and we index to one 5D Gaussian and generate a vector with shape=(5). The expected shape would be: the mixture component is (5, 3), the weight is (3), the observation is (n, 5)

Now we have a another mixture, also with 3 Components that each is a 5D Gaussian, we can stack the two together. The expected shape would be: the mixture component is (5, 3, 2), the weight is (3, 2), the observation is (n, 5, 2)

junpenglao · 2019-02-13T13:49:35Z

And dont get me started with mixture of mixture...

lucianopaz · 2019-02-13T18:05:36Z

That's for a different use case, and it only makes sense for repeated measurement. So it would be cases like: 2 different 3 component mixture stack together. I think these cases are subtitle and potentially easy to be missed use, not sure what would be the best way to handle these (Oh god i so much wish we have the concept of batch shape and sample shape so we can distinguish this explicitly)

But isn't this exactly the same as the w2 categorical case? It is the same as stacking nd independent Mixtures, just like a TFP batch_shape. I agree that if we had that concept in pymc3, it would make this much easier but I think that, as it stands, if w is a vector, then the Mixture shape can be interpreted as an event_shape. That is, all the dimensions of the Mixture will come from the same mixture component. If w is an nd-array, the Mixture's shape that matches w.shape[:-1] should be considered as a batch_shape, because it should be treated the same as stacking independent mixtures together. What do you think?

junpenglao · 2019-02-13T22:30:02Z

As long as the component are sampled as a whole, it is fine. In other words, latent label does not breaks up the components. I still thinks in your example above it doesnt makes sense as each dimension is from a different component.

twiecki · 2019-02-14T14:44:16Z

pymc3/distributions/distribution.py

+    -------
+    List of broadcasted sample arrays
+
+    Examples


This is an amazing doc-string.

lucianopaz · 2019-02-14T14:55:02Z

Let me rephrase the model with the latent categories like this:

import numpy as np
from theano import tensor as tt
import pymc3 as pm


npop = 5
nd = 3

with pm.Model():
    mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
    z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
    m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
    z2s = [pm.Categorical('z2_{}'.format(i), p=np.ones(npop) / npop)
           for i in range(nd)]
    m2s = [pm.Normal('m2_{}'.format(i), mu=mus[i, z2s[i]], sigma=1e-5)
           for i in range(nd)]
    z2 = pm.Deterministic('z2', tt.stack(z2s))
    m2 = pm.Deterministic('m2', tt.stack(m2s))
    vals = pm.sample_prior_predictive(100, vars=['m1', 'm2'])
    m1_val = vals['m1']
    m2_val = vals['m2']

print('m1 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))  # True
print('m2 rows come from the same compenent? {}'.
      format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1))))  # False

Here I explicitly use that m2 is a stack (or batch in TFP terms) of independent mixtures. I think that this is a valid interpretation for how the Mixture class is now handling multidimensional mixture weights w. I think that what is left for this PR to be ready for its final review are some more tests to verify that the random values drawn from the Mixture match what we would expect from the latent categorical model.

junpenglao · 2019-02-14T16:37:48Z

So it is 3 mixture each with a 1d component... Well I guess this also could be a valid use case but my concern is that it would be too easy to be miss use. I will have another look at the docstring.

lucianopaz · 2019-02-15T13:24:55Z

I pushed a few more tests and added some examples into Mixture's docstring. I think that this PR is now ready for a final review.

junpenglao · 2019-02-15T14:06:24Z

LGTM. Great job @lucianopaz.

twiecki · 2019-02-16T10:08:33Z

woohoo!

lucianopaz and others added 3 commits January 30, 2019 13:39

WIP mixture

b6ae4b1

Still WIP but much better performance on posterior predictive samplin…

6da77ac

…g. Still need to work around size issues.

Fixed mixture random method (and also multinomial when using a Choles…

af7ea76

…ky covariance matrix).

lucianopaz requested review from twiecki and junpenglao January 31, 2019 15:04

twiecki reviewed Jan 31, 2019

View reviewed changes

Fixed float32 all close relative tolerance error.

fe44a29

junpenglao reviewed Feb 1, 2019

View reviewed changes

Changed the way in which we wrap the comp_dists.random method to some…

5e3db64

…thing less hackish. At least we dont ignore lint anymore.

twiecki added the WIP label Feb 4, 2019

lucianopaz commented Feb 5, 2019

View reviewed changes

Moved comp_dists type checking to __init__. Changed _comp_logp, mean …

fae4c11

…and median to stack on the last axis (which should allow Mixture to work with multidimensional distributions). Changed docstring a bit. Minor change in _comp_samples to enfore dtype.

Improved support for Mixtures of multidimensional distributions.

e8affd1

lucianopaz added 2 commits February 12, 2019 16:03

Added tests for mixtures of multidimensional distributions.

74ff181

Added kwarg to generate_samples to help pass raw_size_ tuples

86f69ea

junpenglao reviewed Feb 12, 2019

View reviewed changes

lucianopaz added 2 commits February 13, 2019 11:36

Cleaned up infer_comp_dist_shapes and added docstring.

46347b2

Fixed latent component assignment for mixtures of multidimensional di…

2200b46

…stributions.

Fixed mixture of mixtures errors. The error was in broadcast_distribu…

d047a09

…tion_samples.

Patched broadcast_distribution_samples. Must do a cleaner fix.

dd81ec1

Fixed broadcast_distribution_samples bugs. Added release notes.

a058bfe

twiecki reviewed Feb 14, 2019

View reviewed changes

lucianopaz and others added 2 commits February 15, 2019 13:44

Added tests comparing mixtures to latent component models

6f75956

Added more examples to Mixture docstring.

bf39de6

lucianopaz changed the title ~~[WIP] Mixture random cleanup~~ Mixture random cleanup Feb 15, 2019

lucianopaz removed the WIP label Feb 15, 2019

junpenglao approved these changes Feb 15, 2019

View reviewed changes

junpenglao merged commit e3fd56b into pymc-devs:master Feb 16, 2019

lucianopaz deleted the mixture_random branch February 16, 2019 10:19

Uh oh!

Mixture random cleanup #3364

Mixture random cleanup #3364

Uh oh!

Conversation

lucianopaz commented Jan 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucianopaz commented Feb 5, 2019

Uh oh!

lucianopaz commented Feb 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucianopaz commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junpenglao commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucianopaz commented Feb 13, 2019

Uh oh!

lucianopaz commented Feb 13, 2019

Uh oh!

lucianopaz commented Feb 13, 2019

Uh oh!

junpenglao commented Feb 13, 2019

Uh oh!

junpenglao commented Feb 13, 2019

Uh oh!

lucianopaz commented Feb 13, 2019

Uh oh!

junpenglao commented Feb 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucianopaz commented Feb 14, 2019

Uh oh!

junpenglao commented Feb 14, 2019

Uh oh!

lucianopaz commented Feb 15, 2019

Uh oh!

junpenglao commented Feb 15, 2019

Uh oh!

twiecki commented Feb 16, 2019

lucianopaz commented Feb 13, 2019 •

edited

Loading

junpenglao commented Feb 13, 2019 •

edited

Loading