-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
WIP: replace Hessian scaling guessing by an identity matrix #2232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pymc3/step_methods/hmc/base_hmc.py
Outdated
@@ -41,7 +42,7 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False, | |||
vars = inputvars(vars) | |||
|
|||
if scaling is None and potential is None: | |||
scaling = model.test_point | |||
scaling = np.ones(DictToArrayBijection(model.test_point).size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test is failing:
scaling = np.ones(DictToArrayBijection(model.test_point).size)
E NameError: global name 'np' is not defined
I am not too familiar with the literature: do yo have a reference on what the default should be in these situations? When available, is the Hessian scaling better than the identity matrix? |
As far as I understand from the previous discussions, the most robust initialization is ADVI, and Hessian scaling doesn't work always very well anyway. The main issue with Hessian scaling is that it calculates only diagonal elements of the Hessian which makes it heavily dependent on the particular parametrization of the model. So in some well-behaved cases, it's better than the identity matrix, but it doesn't always work well and that's why it seems to be not the best solution to use it by default. I wasn't able to find literature about it though. |
ADVI is definitely the best we have so far. However, for models with discrete RV we cannot use it. I guess we will keep experimenting on the alternatives. test is still failing btw: if scaling is None and potential is None:
> scaling = np.ones(DictToArrayBijection(model.test_point).size)
E TypeError: __init__() takes exactly 3 arguments (2 given)
pymc3/step_methods/hmc/base_hmc.py:46: TypeError |
Thank you, the tests pass now :) Yes, it is a problem that ADVI doesn't work with models that have both discrete and continuous RVs. On the other hand, using HMC/NUTS with discrete variables is still something that is not very well researched (see this issue). I like the idea of writing a paper about the initialization problem. |
Yep. Also HMC/NUTS do not always play well with others (FYI, see also #1990), it really feels a case-by-case situation. |
pymc3/step_methods/hmc/base_hmc.py
Outdated
@@ -41,7 +43,8 @@ def __init__(self, vars=None, scaling=None, step_scale=0.25, is_cov=False, | |||
vars = inputvars(vars) | |||
|
|||
if scaling is None and potential is None: | |||
scaling = model.test_point | |||
bij = DictToArrayBijection(ArrayOrdering(vars), model.test_point) | |||
scaling = np.ones(bij.map(model.test_point).size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be faster to do:
scaling = np.ones(model.dict_to_array(model.test_point).size)
which don't need to import DictToArrayBijection, ArrayOrdering as well
Thanks @a-rodin ! |
Would love to see some literature on this! |
Hm, this is quite a severe change, I think we should test this more thoroughly on which models this works or doesn't work. |
I reverted this merge as we need to think a bit harder of how to do this and test a few models to make sure things aren't breaking. I agree that the current default is not optimal though. Moreover, the np.ones need to be cast to floatX. @a-rodin Can you open a new PR? |
In #2209 there was a discussion between @aseyboldt @twiecki and @fonnesbeck about the default selection of scaling for HMC step meethods. It seemed to conclude that there is a need to replace Hessian scaling guessing by identity matrix in case if the scaling is not provided to the step method explicitly.
I would like to implement this replacement. However, there are some things that are not clear to me. One of them is: is Hessian scaling still necessary in some cases, and if it is, when? Another one, connected to the previous, is should
scaling
parameter provided as a point (i.e. adict
object) be supported or not, as it is not documented right now, but is supported here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/step_methods/hmc/base_hmc.py#L46?