-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Raise NotImplementedError for SplineWrapper gradient operation #2211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Better design is to create new spline in the grad without subclassing |
@ferrine not sure how you mean. |
I do similar thing here https://github.com/Theano/Theano/pull/5963/files#diff-ce7341374a28f4a1b95def7fb580987eR572 |
@ferrine Yes, is a good idea to instantiate objects of a single class recursively instead of having two separate classes. |
pymc3/distributions/dist_math.py
Outdated
itypes = [tt.dscalar] | ||
otypes = [tt.dscalar] | ||
|
||
def __init__(self, spline): | ||
def __init__(self, spline, n_derivatives): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why n derivatives? Recursion is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_derivatives
is the number of derivatives that a given SplineWrapper
object should implement. One wouldn't want to have it larger than necessary, because invocations of UnivariateSpline.derivative() are slow and have O(n) complexity, unbounded recursion doesn't seem to be a good choice here.
pymc3/distributions/dist_math.py
Outdated
x, = inputs | ||
x_grad, = grads | ||
return [x_grad * self.spline_grad(x)] | ||
return [x_grad * self.grad_op(x)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create new op right here. Pure theano code is expected for grad method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid O(n) calculations on each call of grad
method the SciPy spline for the gradient (created by self.spline.derivative()
) should be pre-calculated and stored inside of the object. But in this case it is simpler to also store a pre-created Theano operation for the gradient, rather than somehow maintain a list of pre-created SciPy splines and then pass them to the constructor of a Theano operation in the grad method, so that the resulting operation also use a pre-calculated SciPy spline for the n-th order derivative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can memorize the call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memorize op creation to be more accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about it. But I'm concerned that in this case gradient calculation time becomes non-deterministic. For example it might significantly bias tqdm
estimation for sampling time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Functions are compiled after graph is constructed. That will not affect runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you are right. I added lazy creation of the derivatives.
Lgtm |
pymc3/distributions/dist_math.py
Outdated
|
||
if not hasattr(self, 'grad_op'): | ||
try: | ||
self.grad_op = SplineWrapper(self.spline.derivative()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to add grad_op
to __props__
because it would break __hash__()
and __eq__()
. For example, this code works now as expected, but wouldn't work if grad_op
is added to __props__
:
s = InterpolatedUnivariateSpline(...)
op1 = SplineWrapper(s)
op2 = SplineWrapper(s)
op2.grad(...) # calculate gradient
assert op1 == op2
One would expect op1
to be equal to op2
because they wrap the same spline, but if spline_op
is a property, it won't be the case.
As far as I understand, the purpose of __props__
is to be used for implementation of correct comparisons:
__props__
enables the automatic generation of appropriate__eq__()
and__hash__()
. Given the method__eq__()
, automatically generated from__props__
, two ops will be equal if they have the same values for all the properties listed in__props__
. Given to the method__hash__()
automatically generated from__props__
, two ops will be have the same hash if they have the same values for all the properties listed in__props__
.__props__
will also generate a suitable__str__()
for your op. This requires development version after September 1st, 2014 or version 0.7.
http://deeplearning.net/software/theano/extending/extending_theano.html#op-s-auxiliary-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, isn't derivative deterministic? In that case the only prop (spline) is required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why we have only one Theano prop, spline
. And grad_op
is just a helper property for internal usage, that is needed to avoid unnecessary recalculations of the derivative spline, but not a Theano property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under property I meant
@property
def grad_op(self):...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't want to expose it to the user, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean what would possible usages be? This class is anyway internal to PyMC3, so I think that we need in it only the functionality that is actually used by PyMC3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover it would be more pretty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ferrine Hmm, why not indeed :) Added it in a new commit.
@@ -378,6 +378,18 @@ class SplineWrapper (theano.Op): | |||
def __init__(self, spline): | |||
self.spline = spline | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Could you please add a simple test that ensures that problem in #2209 is solved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks really great!! I've left some code style nitpicks though
pymc3/distributions/dist_math.py
Outdated
@@ -365,6 +365,7 @@ def conjugate_solve_triangular(outer, inner): | |||
grad = tt.triu(s + s.T) - tt.diag(tt.diagonal(s)) | |||
return [tt.switch(ok, grad, floatX(np.nan))] | |||
|
|||
|
|||
class SplineWrapper (theano.Op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8 does not recommend spacing before opening brackets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed all these issues. But just to argue about PEP8... :)
PEP8 forbids spaces after functions or methods, but doesn't say about a whitespace before the base class:
Avoid extraneous whitespace in the following situations:
... (nothing interesting)
Immediately before the open parenthesis that starts the argument list of a function call
... (nothing interesting)
Also, it says that
Method definitions inside a class are surrounded by a single blank line
so a blank line before the first method in a class is legitimate as it surrounds a class method, it is even possible to argue that it is required by PEP8 there.
Actually pep8
linter seems to agree with both ways:
$ cat t.py
class Foo (object):
def foo(self):
print('hi')
class Bar(object):
def bar(self):
print('hi')
$ pep8 t.py
$ # no errors reported
And as PEP8 is a formalized description of the code style used in Python standard library, we can check what is used in it, and it turns out that both variants are present there:
-
class ZipInfo (object)
andclass LZMAFile(_compression.BaseStream)
, althoughclass Foo(object)
is more frequent thanclass Foo (object)
; -
class _Outcome(object):\n def __init__(self, result=None):
andclass _BaseTestCaseContext:\n\n def __init__(self, test_case):
, althoughclass Foo: def __init__(...): ...
is more frequent than
class Foo: def __init__(...): ...
However it was more just to argue, I've already committed the fixes and I see that we need to not just follow PEP8, but also be consistent with other parts of PyMC3, so I agree with your nitpicks and find them to be absolutely reasonable :)
As an off topic, I actually think that it would be nice to have pep8
or flake8
running automatically in Travis and failing builds in case of PEP8 violations. However I briefly investigated this possibility and it seems like it is necessary to either fix some rather large parts of the existing code and/or agree on exceptions from PEP8 that we don't want to follow, like for example maximum line length.
pymc3/distributions/continuous.py
Outdated
@@ -16,7 +16,7 @@ | |||
from pymc3.theanof import floatX | |||
from . import transforms | |||
|
|||
from .dist_math import bound, logpow, gammaln, betaln, std_cdf, i0, i1, alltrue_elemwise, DifferentiableSplineWrapper | |||
from .dist_math import bound, logpow, gammaln, betaln, std_cdf, i0, i1, alltrue_elemwise, SplineWrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is too long, we can wrap this import in brackets and do multiline import
pymc3/tests/test_dist_math.py
Outdated
|
||
|
||
class TestSplineWrapper: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line is not needed, also it is good to inherit from object
Once tests pass I'll merge. Good job, thanks! |
Thank you! |
It is necessary to allow
guess_scaling
fall back tofixed_hessian
here instead of just failing.Fixes #2209.