WIP: replace Hessian scaling guessing by an identity matrix #2232

ghost · 2017-05-27T20:45:06Z

In #2209 there was a discussion between @aseyboldt @twiecki and @fonnesbeck about the default selection of scaling for HMC step meethods. It seemed to conclude that there is a need to replace Hessian scaling guessing by identity matrix in case if the scaling is not provided to the step method explicitly.

I would like to implement this replacement. However, there are some things that are not clear to me. One of them is: is Hessian scaling still necessary in some cases, and if it is, when? Another one, connected to the previous, is should scaling parameter provided as a point (i.e. a dict object) be supported or not, as it is not documented right now, but is supported here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/step_methods/hmc/base_hmc.py#L46?

junpenglao · 2017-05-29T06:47:53Z

pymc3/step_methods/hmc/base_hmc.py

@@ -41,7 +42,7 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False,
        vars = inputvars(vars)

        if scaling is None and potential is None:
-            scaling = model.test_point
+            scaling = np.ones(DictToArrayBijection(model.test_point).size)


test is failing:

scaling = np.ones(DictToArrayBijection(model.test_point).size)

E NameError: global name 'np' is not defined

junpenglao · 2017-05-29T06:50:39Z

I am not too familiar with the literature: do yo have a reference on what the default should be in these situations? When available, is the Hessian scaling better than the identity matrix?

ghost · 2017-05-29T07:32:10Z

As far as I understand from the previous discussions, the most robust initialization is ADVI, and Hessian scaling doesn't work always very well anyway. The main issue with Hessian scaling is that it calculates only diagonal elements of the Hessian which makes it heavily dependent on the particular parametrization of the model.

So in some well-behaved cases, it's better than the identity matrix, but it doesn't always work well and that's why it seems to be not the best solution to use it by default.

I wasn't able to find literature about it though.

junpenglao · 2017-05-29T07:44:29Z

ADVI is definitely the best we have so far. However, for models with discrete RV we cannot use it. I guess we will keep experimenting on the alternatives.
I cannot find any reference on that as well. Maybe we should write a paper on that when we have enough experience @twiecki @fonnesbeck @aseyboldt

test is still failing btw:

        if scaling is None and potential is None:
>           scaling = np.ones(DictToArrayBijection(model.test_point).size)
E           TypeError: __init__() takes exactly 3 arguments (2 given)
pymc3/step_methods/hmc/base_hmc.py:46: TypeError

ghost · 2017-05-29T09:39:49Z

Thank you, the tests pass now :)

Yes, it is a problem that ADVI doesn't work with models that have both discrete and continuous RVs. On the other hand, using HMC/NUTS with discrete variables is still something that is not very well researched (see this issue).

I like the idea of writing a paper about the initialization problem.

junpenglao · 2017-05-29T09:57:58Z

Yep. Also HMC/NUTS do not always play well with others (FYI, see also #1990), it really feels a case-by-case situation.
Something I tried before that seems to work OK when the model only contains few discrete RVs: fit using ADVI on a model with the discrete RV semi-fixed, and used the output as start and scaling for NUTS in the full model (http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/PsychophysicalFunctions.ipynb#12.2-Psychophysical-functions-under-contamination).

junpenglao · 2017-05-29T10:22:56Z

pymc3/step_methods/hmc/base_hmc.py

@@ -41,7 +43,8 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False,
        vars = inputvars(vars)

        if scaling is None and potential is None:
-            scaling = model.test_point
+            bij = DictToArrayBijection(ArrayOrdering(vars), model.test_point)
+            scaling = np.ones(bij.map(model.test_point).size)


It might be faster to do:

scaling = np.ones(model.dict_to_array(model.test_point).size)

which don't need to import DictToArrayBijection, ArrayOrdering as well

junpenglao · 2017-05-29T11:19:08Z

Thanks @a-rodin !

springcoil · 2017-05-29T11:24:48Z

Would love to see some literature on this!

twiecki · 2017-05-29T12:40:21Z

Hm, this is quite a severe change, I think we should test this more thoroughly on which models this works or doesn't work.

…tity" This reverts commit eb6042f, reversing changes made to c3afa00.

twiecki · 2017-05-29T13:22:04Z

I reverted this merge as we need to think a bit harder of how to do this and test a few models to make sure things aren't breaking. I agree that the current default is not optimal though.

Moreover, the np.ones need to be cast to floatX.

@a-rodin Can you open a new PR?

ghost mentioned this pull request May 27, 2017

NUTS step, when selected explicitly, requires the second derivative #2209

Closed

junpenglao reviewed May 29, 2017

View reviewed changes

Return identity matrix if no scaling provided

52761c2

junpenglao reviewed May 29, 2017

View reviewed changes

Simplify calculation of identity scaling matrix

c4659b4

junpenglao merged commit eb6042f into pymc-devs:master May 29, 2017

twiecki added a commit that referenced this pull request May 29, 2017

Revert "Merge pull request #2232 from a-rodin/scaling-default-to-iden…

1aea645

…tity" This reverts commit eb6042f, reversing changes made to c3afa00.

ghost mentioned this pull request May 29, 2017

Replace Hessian scaling guessing by an identity matrix #2236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: replace Hessian scaling guessing by an identity matrix #2232

WIP: replace Hessian scaling guessing by an identity matrix #2232

ghost commented May 27, 2017 •

edited by ghost

Loading

junpenglao May 29, 2017

junpenglao commented May 29, 2017

ghost commented May 29, 2017 •

edited by ghost

Loading

junpenglao commented May 29, 2017

ghost commented May 29, 2017 •

edited by ghost

Loading

junpenglao commented May 29, 2017 •

edited

Loading

junpenglao May 29, 2017

junpenglao commented May 29, 2017

springcoil commented May 29, 2017

twiecki commented May 29, 2017

twiecki commented May 29, 2017

WIP: replace Hessian scaling guessing by an identity matrix #2232

WIP: replace Hessian scaling guessing by an identity matrix #2232

Conversation

ghost commented May 27, 2017 • edited by ghost Loading

junpenglao May 29, 2017

Choose a reason for hiding this comment

junpenglao commented May 29, 2017

ghost commented May 29, 2017 • edited by ghost Loading

junpenglao commented May 29, 2017

ghost commented May 29, 2017 • edited by ghost Loading

junpenglao commented May 29, 2017 • edited Loading

junpenglao May 29, 2017

Choose a reason for hiding this comment

junpenglao commented May 29, 2017

springcoil commented May 29, 2017

twiecki commented May 29, 2017

twiecki commented May 29, 2017

ghost commented May 27, 2017 •

edited by ghost

Loading

ghost commented May 29, 2017 •

edited by ghost

Loading

ghost commented May 29, 2017 •

edited by ghost

Loading

junpenglao commented May 29, 2017 •

edited

Loading