Skip to content

Replace Hessian scaling guessing by an identity matrix #2236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from May 31, 2017
Merged

Replace Hessian scaling guessing by an identity matrix #2236

merged 1 commit into from May 31, 2017

Conversation

ghost
Copy link

@ghost ghost commented May 29, 2017

New PR instead of #2232, with np.ones cast to floatX.

If I understand it right, it seems like Stan uses identity scaling by default. CmdStan manual states that:

All HMC implementations in Stan utilize quadratic kinetic energy functions which are specified up to the choice of a symmetric, positive-definite matrix known as a mass matrix or, more formally, a metric (Betancourt and Stein, 2011).

If the metric is constant then the resulting implementation is known as Euclidean HMC. Stan allows for three Euclidean HMC implementations: a unit metric, a diagonal metric, and a dense metric. These can be specified with the values unit_e, diag_e, and dense_e, respectively.

Future versions of Stan will also include dynamic metrics associated with Riemannian HMC (Girolami and Calderhead, 2011; Betancourt, 2012).

metric=<list element>
Geometry of base manifold
Valid values: unit_e, diag_e, dense_e (Defaults to diag_e)
    unit_e
        Euclidean manifold with unit metric
    diag_e
        Euclidean manifold with diag metric
    dense_e
        Euclidean manifold with dense metric

And the source code corresponding to diag_e shows that by default the metric is an identity matrix.

@twiecki
Copy link
Member

twiecki commented May 29, 2017

While they start with identity, they also adapt during sampling according to the recent samples.

I think this still makes sense, but we should test it on some models first (like in the examples folder) to see if it affects sampling in a bad way.

@junpenglao
Copy link
Member

The thing is, comparing to ADVI init, the fallback option (Hessian) is kind of sucks right now.

@twiecki
Copy link
Member

twiecki commented May 31, 2017

I completely agree, Hessian at test-point is a poor default choice if nothing is passed in. Mainly was worried about breaking people's code or unanticipated effects. Thinking more about this however, I think this should be pretty safe to merge, most people should use pm.sample() with auto-tuning or they would pass in a scaling or point when instantiating NUTS.

@aseyboldt?

@aseyboldt
Copy link
Member

I think so too. Both are pretty bad, but using the hessian is just much more trouble and also probably more surprising. Don't think it will break models either.

@twiecki twiecki merged commit 6f58dbf into pymc-devs:master May 31, 2017
@ghost
Copy link
Author

ghost commented May 31, 2017

For now, I checked only .py examples (i.e. not notebooks) in pymc3/examples.

It turned out that only two of them create NUTS sampler explicitly and without specifying scaling parameter, simpletest.py and gelman_bioassay.py. I ran Kolmogorov-Smirnov tests to compare marginal distributions of all samples variables in traces for master branch and for this branch, rebase to master. KS-test passed with p-value larger than 10% for all of them except x in simpletest.py. However, when I plot traces for this x variable, it turned out that it is much more close to normal with identity scaling in comparison to Hessian.

@ghost
Copy link
Author

ghost commented May 31, 2017

@twiecki @aseyboldt
Just to clarify: so now we use identity when no scaling provided at all.

But we still support the ability to provide scaling parameter as a test point, which is used in examples with find_MAP (like lasso_missing.py). In this case a Hessian of the test point is still computed and used as scaling.

It is what we want, right? I mean when we get explicitly specified start point (most probably MAP), we should still use Hessian and not identity, right?

@junpenglao
Copy link
Member

Just FYI: model built with pm.glm.GLM are mostly OK with find_MAP and Hessian as I gather from the limited testing yesterday.

@twiecki
Copy link
Member

twiecki commented May 31, 2017

Our examples folder is in poor health to begin with. I wonder if we should rather delete them at this point and turn the ones we think are useful into proper NBs. But currently they are either broken or show incorrect usage of pymc3.

@fonnesbeck
Copy link
Member

Let's fix them. I think there is merit to having some examples that are not in notebook format.

@junpenglao
Copy link
Member

@twiecki @fonnesbeck I can go through them.
Many of them are using Metropolis, should I change the default to NUTS with ADVI init, and add a new kwarg "legacy" for the old way of doing sampling?

@fonnesbeck
Copy link
Member

@junpenglao I would fully update everything to the current best practices (i.e. let PyMC choose), unless it is explicitly demonstrating an alternative.

@ghost ghost mentioned this pull request Jun 1, 2017
22 tasks
@junpenglao junpenglao mentioned this pull request Jun 1, 2017
@ghost ghost changed the title WIP: replace Hessian scaling guessing by an identity matrix Replace Hessian scaling guessing by an identity matrix Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants