-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
comment: code complexity somewhat overwhelming #3671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There are a number of "teaching" HMC implementations out there (e.g., https://github.com/rmcgibbo/pyhmc). I think it's unavoidable to have pretty ugly type-checking and class hierarchies in production/general-purpose HMC codebases. |
The design goals of PyMC3 center around making models easy to implement and be highly performant, and neither of these goals are necessarily consistent with being able to easily digest the implementation. We are mainly concerned with steering our users clear of having to write Theano code, while still enjoying all the benefits of having their model run on Theano as a computational backend. That said, PyMC4 will involve a rewrite of the PyMC codebase (this time, around TF and TF Probability), so it offers an opportunity to go back and change things that perhaps ought to have been done differently in the first place (though the design goals are largely the same). We would greatly welcome your feedback on our proposed approach(es) as that work starts to come together! |
out of curiosity, what do you see as the primary driver of complexity in this codebase?
asking in order to learn from your experience as have often faced similar issues with python projects as they grow larger. thanks! |
One of the things that drives complexity in debugging is that what look like the classes (e.g., Another issue is that the class hierarchy doesn't line up nicely with Theano. So if you look at the code for One thing I was wondering was why we don't extend the Theano class hierarchy so that, for example, |
There are indeed a lot of complexity, personally I think historically it is designed to have the (conditional) log_prob function with its gradient easily generated, then we found edge cases and need to add more complexity to fix bugs and maintain backward compatibility. Some of which are because design "mistake" (shape handling, mostly). In case you missed it, https://docs.pymc.io/developer_guide.html might help understanding the design. |
A few more thoughts:
Wait until you check out the variational inference part. 😬 Part of the complexity I would blame it on premature optimization. The class hierarchy in NUTS and VI is because we want to have more abstraction so that we can modify and add sub method easier. By submethod I mean things like adaptation in NUTS/HMC, loss function and approximation in VI. In retrospect it might clearer and more standalone if we implement each method as its own class, the con is that we will have much more code redundancy and addition of new method much more difficult (well not that it is easy now, but if you know the system well modification became much easier like #3605 #3656). |
@rpgoldman It is: https://github.com/pymc-devs/pymc3/blob/cc5527988f72bbd2d62c7248226f62dcfbd7403d/pymc3/model.py#L34 |
As soon as you start going in that direction, you'll end up here, for better or worse. |
@jstrong-tios for my two cents, and having just gone through the process of learning HMC, I found the following resources to be very helpful in understanding HMC and NUTS at a higher level.
Hope this helps! Since this isn't a development issue, I'll close this issue, but please feel free to continue the discussion. |
@eigenfoo thank you! I had already come across minimc and it is outstanding! That was the one that got everything to click for me - Colin Carroll is awesome. I have not come across the Neal handbook - so that may be a big assist. Was just starting to dig into the NUTS paper earlier today and felt I would be able to make some sense of it this time. Thanks again for your help! |
this is not a bug, and feel free to immediately close this issue. I cloned pymc3 in an effort to understand how HMC/NUTS sampling works. What I found was in order to understand how the
NUTS
class works, I would need to understand:BlockedStep
>GradientSharedStep
>BaseHMC
>NUTS
)Model
class that inherits from three other classes in addition to a custom metaclass (class Model(Context, Factor, WithMemoization, metaclass=InitContextMeta):
)Point
- an object-creating function with capitalized name (expectedclass Point
, searching initially failed)QuadPotential
>QuadPotentialDiagAdapt
CpuLeapFrogIntegrator
ArrayOrdering
,DictToArrayBijection
IntEnum
>Competence
(alsounique
class decorator)Needless to say, it was a bit overwhelming. I'm not complaining about it, or urging you to "fix" it. This is just an anecdotal aside about what it's like for someone coming at pymc3 fresh to try to dive into the code.
The text was updated successfully, but these errors were encountered: