-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ReverseAD] Add support for user-defined hessians #1819
Conversation
493b86d
to
9e2fd5b
Compare
9e2fd5b
to
4101a43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! As far as I understand, we assume by default that the Hessian of a custom operator is dense, right?
If so, would it be possible to emphasize this in the documentation (I can do it if that helps)?
So I did some benchmarking of this (a diagonal quadratic is also effectively a worst-case usage of this): julia> using JuMP, Ipopt
julia> function main(N)
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:N])
function foo(x...)
return sum((x[i] - 1.1*i)^2 for i in 1:N)
end
register(model, :foo, N, foo; autodiff = true)
@NLobjective(model, Min, foo(x...))
optimize!(model)
return value.(x)
end
main (generic function with 1 method)
julia> main(10); # compilation
julia> for i in [10, 100, 1_000, 10_000]
GC.gc()
@time main(i)
end
0.003252 seconds (1.53 k allocations: 98.891 KiB)
0.830141 seconds (3.08 M allocations: 175.975 MiB, 3.97% gc time)
0.186039 seconds (591.61 k allocations: 153.440 MiB, 10.86% gc time)
7.209030 seconds (33.99 M allocations: 13.457 GiB, 12.40% gc time)
julia> function main_1(N)
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:N])
function foo(x...)
return sum((x[i] - 1.1*i)^2 for i in 1:N)
end
function ∇foo(g, x...)
for i in 1:N
g[i] = 2 * (x[i] - 1.1 * i)
end
return
end
register(model, :foo, N, foo, ∇foo)
@NLobjective(model, Min, foo(x...))
optimize!(model)
return value.(x)
end
main_1 (generic function with 1 method)
julia> main_1(10); # compilation
julia> for i in [10, 100, 1_000, 10_000]
GC.gc()
@time main_1(i)
end
0.002592 seconds (1.44 k allocations: 71.875 KiB)
0.028930 seconds (80.08 k allocations: 4.595 MiB, 88.17% compilation time)
0.032589 seconds (117.40 k allocations: 6.269 MiB, 75.08% compilation time)
0.083687 seconds (495.45 k allocations: 23.409 MiB, 30.14% compilation time)
julia> function main_2(N)
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:N])
function foo(x...)
return sum((x[i] - 1.1*i)^2 for i in 1:N)
end
function ∇foo(g, x...)
for i in 1:N
g[i] = 2 * (x[i] - 1.1 * i)
end
return
end
function ∇²foo(H, x...)
for i in 1:N
H[i, i] = 2.0
end
return
end
register(model, :foo, N, foo, ∇foo, ∇²foo)
@NLobjective(model, Min, foo(x...))
optimize!(model)
return value.(x)
end
main_2 (generic function with 1 method)
julia> main_2(10); # compilation
julia> for i in [10, 100, 1_000] # , 10_000]
GC.gc()
@time main_2(i)
end
0.002018 seconds (2.67 k allocations: 162.812 KiB)
0.047883 seconds (216.95 k allocations: 13.330 MiB, 71.50% compilation time)
5.314729 seconds (13.42 M allocations: 822.473 MiB, 7.54% gc time, 0.63% compilation time) As could be expected, it scales pretty poorly if you have large input vectors. But there's an easy escape hatch: just don't define the hessian and use first-order methods. This is really targeting problems which have lots of other structure, and a small user-defined function in one part that is disabling second-order information for the full problem. |
Conclusion was if we wanted to add a sparse method in future, we could add |
f_input[i] = ex.forward_storage[children_arr[c]] | ||
end | ||
H = _UnsafeLowerTriangularMatrixView( | ||
d.user_output_buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is it enforced that user_output_buffer is large enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the constructor could break other unsafe views on the same array. That sounds extra unsafe... At least needs to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For now though, the only usage of this is evaluating the user-defined hessians, and we use the .user_output_buffer
as storage, so it's a safe usage of the unsafe behavior.
305a157
to
e815011
Compare
Refactored from jump-dev/JuMP.jl#2953