Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ReverseAD] Add support for user-defined hessians #1819

Merged
merged 5 commits into from
May 30, 2022

Conversation

odow
Copy link
Member

@odow odow commented Apr 26, 2022

Refactored from jump-dev/JuMP.jl#2953

@odow odow added the Project: next-gen nonlinear support Issues relating to nonlinear support label Apr 26, 2022
@odow odow mentioned this pull request May 2, 2022
3 tasks
@odow odow force-pushed the od/multivariate-hessians branch from 493b86d to 9e2fd5b Compare May 3, 2022 03:01
Base automatically changed from od/nonlinear to master May 5, 2022 21:47
@odow odow force-pushed the od/multivariate-hessians branch from 9e2fd5b to 4101a43 Compare May 5, 2022 21:52
Copy link
Contributor

@frapac frapac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! As far as I understand, we assume by default that the Hessian of a custom operator is dense, right?

If so, would it be possible to emphasize this in the documentation (I can do it if that helps)?

@odow odow mentioned this pull request May 23, 2022
3 tasks
@odow
Copy link
Member Author

odow commented May 23, 2022

So I did some benchmarking of this (a diagonal quadratic is also effectively a worst-case usage of this):

julia> using JuMP, Ipopt

julia> function main(N)
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x[1:N])
           function foo(x...)
               return sum((x[i] - 1.1*i)^2 for i in 1:N)
           end
           register(model, :foo, N, foo; autodiff = true)
           @NLobjective(model, Min, foo(x...))
           optimize!(model)
           return value.(x)
       end
main (generic function with 1 method)

julia> main(10); # compilation

julia> for i in [10, 100, 1_000, 10_000]
           GC.gc()
           @time main(i)
       end
  0.003252 seconds (1.53 k allocations: 98.891 KiB)
  0.830141 seconds (3.08 M allocations: 175.975 MiB, 3.97% gc time)
  0.186039 seconds (591.61 k allocations: 153.440 MiB, 10.86% gc time)
  7.209030 seconds (33.99 M allocations: 13.457 GiB, 12.40% gc time)

julia> function main_1(N)
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x[1:N])
           function foo(x...)
               return sum((x[i] - 1.1*i)^2 for i in 1:N)
           end
           function ∇foo(g, x...)
               for i in 1:N
                   g[i] = 2 * (x[i] - 1.1 * i)
               end
               return
           end
           register(model, :foo, N, foo, ∇foo)
           @NLobjective(model, Min, foo(x...))
           optimize!(model)
           return value.(x)
       end
main_1 (generic function with 1 method)

julia> main_1(10); # compilation

julia> for i in [10, 100, 1_000, 10_000]
           GC.gc()
           @time main_1(i)
       end
  0.002592 seconds (1.44 k allocations: 71.875 KiB)
  0.028930 seconds (80.08 k allocations: 4.595 MiB, 88.17% compilation time)
  0.032589 seconds (117.40 k allocations: 6.269 MiB, 75.08% compilation time)
  0.083687 seconds (495.45 k allocations: 23.409 MiB, 30.14% compilation time)

julia> function main_2(N)
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x[1:N])
           function foo(x...)
               return sum((x[i] - 1.1*i)^2 for i in 1:N)
           end
           function ∇foo(g, x...)
               for i in 1:N
                   g[i] = 2 * (x[i] - 1.1 * i)
               end
               return
           end
           function ∇²foo(H, x...)
               for i in 1:N
                   H[i, i] = 2.0
               end
               return
           end
           register(model, :foo, N, foo, ∇foo, ∇²foo)
           @NLobjective(model, Min, foo(x...))
           optimize!(model)
           return value.(x)
       end
main_2 (generic function with 1 method)

julia> main_2(10); # compilation

julia> for i in [10, 100, 1_000] # , 10_000]
           GC.gc()
           @time main_2(i)
       end
  0.002018 seconds (2.67 k allocations: 162.812 KiB)
  0.047883 seconds (216.95 k allocations: 13.330 MiB, 71.50% compilation time)
  5.314729 seconds (13.42 M allocations: 822.473 MiB, 7.54% gc time, 0.63% compilation time)

As could be expected, it scales pretty poorly if you have large input vectors. But there's an easy escape hatch: just don't define the hessian and use first-order methods. This is really targeting problems which have lots of other structure, and a small user-defined function in one part that is disabling second-order information for the full problem.

@odow
Copy link
Member Author

odow commented May 26, 2022

Conclusion was if we wanted to add a sparse method in future, we could add register(model, :foo, N, foo, ∇foo, ∇²foo; hessian_sparsity = [(1, 1), (2, 2)]) and then H would be a Vector{Float64} as input. This would be non-breaking.

f_input[i] = ex.forward_storage[children_arr[c]]
end
H = _UnsafeLowerTriangularMatrixView(
d.user_output_buffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it enforced that user_output_buffer is large enough?

Copy link
Member Author

@odow odow May 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the constructor could break other unsafe views on the same array. That sounds extra unsafe... At least needs to be documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For now though, the only usage of this is evaluating the user-defined hessians, and we use the .user_output_buffer as storage, so it's a safe usage of the unsafe behavior.

@odow odow force-pushed the od/multivariate-hessians branch from 305a157 to e815011 Compare May 30, 2022 01:14
@odow odow merged commit 5802209 into master May 30, 2022
@odow odow deleted the od/multivariate-hessians branch May 30, 2022 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants