Skip to content

WIP: replace Hessian scaling guessing by an identity matrix #2232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from May 29, 2017
Merged

WIP: replace Hessian scaling guessing by an identity matrix #2232

merged 2 commits into from May 29, 2017

Conversation

ghost
Copy link

@ghost ghost commented May 27, 2017

In #2209 there was a discussion between @aseyboldt @twiecki and @fonnesbeck about the default selection of scaling for HMC step meethods. It seemed to conclude that there is a need to replace Hessian scaling guessing by identity matrix in case if the scaling is not provided to the step method explicitly.

I would like to implement this replacement. However, there are some things that are not clear to me. One of them is: is Hessian scaling still necessary in some cases, and if it is, when? Another one, connected to the previous, is should scaling parameter provided as a point (i.e. a dict object) be supported or not, as it is not documented right now, but is supported here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/step_methods/hmc/base_hmc.py#L46?

@@ -41,7 +42,7 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False,
vars = inputvars(vars)

if scaling is None and potential is None:
scaling = model.test_point
scaling = np.ones(DictToArrayBijection(model.test_point).size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is failing:

      scaling = np.ones(DictToArrayBijection(model.test_point).size)

E NameError: global name 'np' is not defined

@junpenglao
Copy link
Member

I am not too familiar with the literature: do yo have a reference on what the default should be in these situations? When available, is the Hessian scaling better than the identity matrix?

@ghost
Copy link
Author

ghost commented May 29, 2017

As far as I understand from the previous discussions, the most robust initialization is ADVI, and Hessian scaling doesn't work always very well anyway. The main issue with Hessian scaling is that it calculates only diagonal elements of the Hessian which makes it heavily dependent on the particular parametrization of the model.

So in some well-behaved cases, it's better than the identity matrix, but it doesn't always work well and that's why it seems to be not the best solution to use it by default.

I wasn't able to find literature about it though.

@junpenglao
Copy link
Member

ADVI is definitely the best we have so far. However, for models with discrete RV we cannot use it. I guess we will keep experimenting on the alternatives.
I cannot find any reference on that as well. Maybe we should write a paper on that when we have enough experience @twiecki @fonnesbeck @aseyboldt

test is still failing btw:

        if scaling is None and potential is None:
>           scaling = np.ones(DictToArrayBijection(model.test_point).size)
E           TypeError: __init__() takes exactly 3 arguments (2 given)
pymc3/step_methods/hmc/base_hmc.py:46: TypeError

@ghost
Copy link
Author

ghost commented May 29, 2017

Thank you, the tests pass now :)

Yes, it is a problem that ADVI doesn't work with models that have both discrete and continuous RVs. On the other hand, using HMC/NUTS with discrete variables is still something that is not very well researched (see this issue).

I like the idea of writing a paper about the initialization problem.

@junpenglao
Copy link
Member

junpenglao commented May 29, 2017

Yep. Also HMC/NUTS do not always play well with others (FYI, see also #1990), it really feels a case-by-case situation.
Something I tried before that seems to work OK when the model only contains few discrete RVs: fit using ADVI on a model with the discrete RV semi-fixed, and used the output as start and scaling for NUTS in the full model (http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/PsychophysicalFunctions.ipynb#12.2-Psychophysical-functions-under-contamination).

@@ -41,7 +43,8 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False,
vars = inputvars(vars)

if scaling is None and potential is None:
scaling = model.test_point
bij = DictToArrayBijection(ArrayOrdering(vars), model.test_point)
scaling = np.ones(bij.map(model.test_point).size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be faster to do:

scaling = np.ones(model.dict_to_array(model.test_point).size)

which don't need to import DictToArrayBijection, ArrayOrdering as well

@junpenglao junpenglao merged commit eb6042f into pymc-devs:master May 29, 2017
@junpenglao
Copy link
Member

Thanks @a-rodin !

@springcoil
Copy link
Contributor

Would love to see some literature on this!

@twiecki
Copy link
Member

twiecki commented May 29, 2017

Hm, this is quite a severe change, I think we should test this more thoroughly on which models this works or doesn't work.

twiecki added a commit that referenced this pull request May 29, 2017
…tity"

This reverts commit eb6042f, reversing
changes made to c3afa00.
@twiecki
Copy link
Member

twiecki commented May 29, 2017

I reverted this merge as we need to think a bit harder of how to do this and test a few models to make sure things aren't breaking. I agree that the current default is not optimal though.

Moreover, the np.ones need to be cast to floatX.

@a-rodin Can you open a new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants