-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Mixture random cleanup #3364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixture random cleanup #3364
Conversation
…g. Still need to work around size issues.
…ky covariance matrix).
@@ -267,7 +267,7 @@ def random(self, point=None, size=None): | |||
else: | |||
std_norm_shape = mu.shape | |||
standard_normal = np.random.standard_normal(std_norm_shape) | |||
return mu + np.tensordot(standard_normal, chol, axes=[[-1], [-1]]) | |||
return mu + np.einsum('...ij,...j->...i', chol, standard_normal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had not understood tensordot
's broadcasting rules well and I noticed it started giving weird shaped outputs, and einsum
was the most intuitive way to get the right broadcasting.
pymc3/distributions/mixture.py
Outdated
self._broadcast_shape = self._comp_dist_shapes | ||
self.is_multidim_comp = True | ||
elif isinstance(comp_dists, Iterable): | ||
if not all((isinstance(comp_dist, Distribution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be move to the __init__
, maybe it could also save a few lines in the if/else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can do that. I thought that it would be a good check to make whenever comp_dists
' value was set. I know that we only really change comp_dists
during __init__
, but I thought that maybe some users could potentially change its values in derived classes or things like that.
pymc3/distributions/mixture.py
Outdated
samples = [] | ||
for dist_shape, comp_dist in zip(comp_dist_shapes, | ||
self.comp_dists): | ||
def generator(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not joking that this is a horrible hack... I think it might be better to overwrite the comp_dist.random method (it's at least more explicit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean just changing the comp_dists[i].random
method somewhere else? Maybe I could add a _generators
attribute during comp_dists.setter
, with a wrapper of comp_dists[i].random
that takes a _raw_shape
parameter too. I'll try it out.
if samples.shape[-1] == 1: | ||
return samples[..., 0] | ||
else: | ||
return samples | ||
|
||
def infer_comp_dist_shapes(self, point=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to call this once in __init__
and save the outputs as properties for the mixture class? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the comp_dist
shapes may depend on the values passed with the point
dictionary. It's the case for sample_posterior_predictive
at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, good point.
…thing less hackish. At least we dont ignore lint anymore.
pymc3/distributions/mixture.py
Outdated
) | ||
# In the logp we assume the last axis holds the mixture components | ||
# so we move the axis to the last dimension | ||
samples = np.moveaxis(samples, 0, -1) | ||
if samples.shape[-1] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement is legacy code that I don't really understand. Why should we do this test, wouldn't it be done by Mixture.random
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont remember as well, but at some point there is an error with shape = (100, 1) and shape = (100, ) - not even sure we have a test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So error happens when we try to call random
from the mixture and comp_dists
has shape (100, 1)
but Mixture
has shape (100,)
? I'm not sure how the error goes. Did it happen when comp_dists
was a multidimensional distribution or when it was a list of distributions? I'd appreciate any thoughts on extra tests to write.
…and median to stack on the last axis (which should allow Mixture to work with multidimensional distributions). Changed docstring a bit. Minor change in _comp_samples to enfore dtype.
I had forgotten to mention another change this PR implements, that turns out to be big: the I now pushed a small set of changes that should allow |
I went down the road of adding support for multidimensional mixtures. I've run some local tests but still need to write down a more systematic testsuite for this functionality. |
@@ -75,27 +87,98 @@ def test_normal_mixture(self): | |||
np.sort(self.norm_mu), | |||
rtol=0.1, atol=0.1) | |||
|
|||
def test_normal_mixture_nd(self): | |||
nd, ncomp = 3, 5 | |||
@pytest.mark.parametrize('nd,ncomp', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is nice!
pymc3/distributions/mixture.py
Outdated
size=None) | ||
comp_dist_shapes = test_sample.shape | ||
broadcast_shape = comp_dist_shapes | ||
includes_mixture = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not complete sure what is the function of includes_mixture
here - I imagine is something to do with making sure the broadcasting of w
is correct. Could you add some small docstrings to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Yes, I'll add a docstring. includes_mixture
maybe was not the best choice of name, and maybe is redundant. It's a bool that indicates whether the comp_dists
last axis has the mixture components or not, but that's the same as testing if comp_dists
is a distribution or a list. Hmm, I'll see how not to depend on it.
pymc3/distributions/mixture.py
Outdated
if isinstance(comp_dists, Distribution): | ||
self._comp_dist_shapes = to_tuple(comp_dists.shape) | ||
self._broadcast_shape = self._comp_dist_shapes | ||
self.is_multidim_comp = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is a bit confusion - this is only to distinguish between the comp being a list or a distribution right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right. I'll change this
pymc3/distributions/mixture.py
Outdated
# Happens in __init__ when computing self.logp(comp_modes) | ||
self_shape = None | ||
comp_shape = tuple(comp_dists.shape) | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no else clause here? I think you should raise an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! You're right! Thanks for catching that. I'll add an else clause, or restructure the conditions a bit.
While working on the multidimensional distributions support I came across something that I think is wrong in Consider the following example import numpy as np
import pymc3 as pm
npop = 5
nd = 3
with pm.Model():
mus = np.tile(np.arange(npop)[None, :], (nd, 1))
m1 = pm.NormalMixture('m1',
w=np.ones(npop) / npop,
mu=mus,
sigma=1e-5,
comp_shape=(nd, npop),
shape=nd)
m2 = pm.NormalMixture('m2',
w=np.ones((nd, npop)) / npop,
mu=mus,
sigma=1e-5,
comp_shape=(nd, npop),
shape=nd)
m1_val = m1.random(size=100)
m2_val = m2.random(size=100)
print('m1 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))
print('m2 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1)))) Both I think that import numpy as np
from theano import tensor as tt
import pymc3 as pm
npop = 5
nd = 3
with pm.Model():
mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
z2 = pm.Categorical('z2', p=np.ones(npop) / npop, shape=nd)
mu2 = tt.as_tensor_variable([mus[i, z2[i]] for i in range(nd)])
m2 = pm.Normal('m2', mu=mu2, sigma=1e-5, shape=nd)
m1_val = m1.random(size=100)
m2_val = m2.random(size=100)
print('m1 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1))))
print('m2 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1)))) |
I see what you mean, but in each component (regardless of the dimension) it should always slice through all the dimension, which means For example: pm.Categorical.dist(p=np.ones(npop) / npop, shape=nd).random()
# ==> array([0, 2, 4]) means in one realization, you select component 1 dimension 1 as dimension 1, component 3 dimension 2 as dimension 2, component 5 dimension 3 as dimension 3. Which is a different model than what i would expect as a mixture model that each slice of the data (a 3 dimension vector in this case) are randomly selected from one component of the mixture. |
So it is safe to assume that |
@junpenglao, |
With the latest commits, the below code snippets match the behavior for the mixture and latent label models import numpy as np
from theano import tensor as tt
import pymc3 as pm
npop = 5
nd = 3
with pm.Model():
mus = np.tile(np.arange(npop)[None, :], (nd, 1))
m1 = pm.NormalMixture('m1',
w=np.ones(npop) / npop,
mu=mus,
sigma=1e-5,
comp_shape=(nd, npop),
shape=nd)
m2 = pm.NormalMixture('m2',
w=np.ones((nd, npop)) / npop,
mu=mus,
sigma=1e-5,
comp_shape=(nd, npop),
shape=nd)
m1_val = m1.random(size=100)
m2_val = m2.random(size=100)
print('m1 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1)))) # True
print('m2 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1)))) # False
with pm.Model():
mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
z2 = pm.Categorical('z2', p=np.ones(npop) / npop, shape=nd)
mu2 = tt.as_tensor_variable([mus[i, z2[i]] for i in range(nd)])
m2 = pm.Normal('m2', mu=mu2, sigma=1e-5, shape=nd)
m1_val = m1.random(size=100)
m2_val = m2.random(size=100)
print('m1 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1)))) # True
print('m2 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1)))) # False |
That's for a different use case, and it only makes sense for repeated measurement. So it would be cases like: 2 different 3 component mixture stack together. I think these cases are subtitle and potentially easy to be missed use, not sure what would be the best way to handle these (Oh god i so much wish we have the concept of batch shape and sample shape so we can distinguish this explicitly). So to reason about the 2D w, think of this usecase: Now we have a another mixture, also with 3 Components that each is a 5D Gaussian, we can stack the two together. The expected shape would be: the mixture component is (5, 3, 2), the weight is (3, 2), the observation is (n, 5, 2) |
And dont get me started with mixture of mixture... |
But isn't this exactly the same as the |
As long as the component are sampled as a whole, it is fine. In other words, latent label does not breaks up the components. I still thinks in your example above it doesnt makes sense as each dimension is from a different component. |
------- | ||
List of broadcasted sample arrays | ||
|
||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing doc-string.
Let me rephrase the model with the latent categories like this: import numpy as np
from theano import tensor as tt
import pymc3 as pm
npop = 5
nd = 3
with pm.Model():
mus = tt.as_tensor_variable(np.tile(np.arange(npop)[None, :], (nd, 1)))
z1 = pm.Categorical('z1', p=np.ones(npop) / npop)
m1 = pm.Normal('m1', mu=mus[..., z1], sigma=1e-5, shape=nd)
z2s = [pm.Categorical('z2_{}'.format(i), p=np.ones(npop) / npop)
for i in range(nd)]
m2s = [pm.Normal('m2_{}'.format(i), mu=mus[i, z2s[i]], sigma=1e-5)
for i in range(nd)]
z2 = pm.Deterministic('z2', tt.stack(z2s))
m2 = pm.Deterministic('m2', tt.stack(m2s))
vals = pm.sample_prior_predictive(100, vars=['m1', 'm2'])
m1_val = vals['m1']
m2_val = vals['m2']
print('m1 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m1_val) < 1e-3, axis=-1)))) # True
print('m2 rows come from the same compenent? {}'.
format(all(np.all(np.diff(m2_val) < 1e-3, axis=-1)))) # False Here I explicitly use that |
So it is 3 mixture each with a 1d component... Well I guess this also could be a valid use case but my concern is that it would be too easy to be miss use. I will have another look at the docstring. |
I pushed a few more tests and added some examples into |
LGTM. Great job @lucianopaz. |
woohoo! |
In PR #3293, I had tried to fix the
Mixture.random
method (issue #3270). I had mistakenly not added WIP to the PR and it was merged with the tests passing but with some rough edges. This PR addresses one important thing that was poorly implemented in PR #3293: vectorization ofcomp_dists.random
.The most important changes are:
comp_dists
shape inference method._comp_samples
method was rewritten to get a uniform interface withdistributions.distribution.generate_samples
.size
,dist_shape
andbroadcast_shape
are handled more clearlyMultinomial.random
, which may be the cause of sample_posterior_predictive returns the wrong dimension #3343Points that I would like to discuss:
test_sampling.py::TestSamplePPC::test_normal_scalar
. The strange thing was that it was also failing for me on master. I made a slight edit to it and would like your opinion.generator
decorator function aroundcomp_dist.random
to provide the rawsize
instead of thesize
that is combined withdist_shape
andbroadcast_shape
thatgenerate_samples
gives to the generators required me to ignore a pylint warning. Could you suggest an alternative that does not have to do that or something less hackish regarding thesize
parameter?sample_prior
andsample_posterior_predictive
test forNormalMixture
too?This PR does not address another point that was lacking from #3293, which is that the
LKJCholeskyCorr.random
function is not tested. I will work on it and submit a separate PR.