Skip to content

Commit 8d7e89f

Browse files
authored
Add support for user-defined multivariate hessians (#2961)
1 parent 0e37702 commit 8d7e89f

File tree

5 files changed

+349
-20
lines changed

5 files changed

+349
-20
lines changed

docs/make.jl

+1
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ const _PAGES = [
138138
"tutorials/nonlinear/rosenbrock.md",
139139
"tutorials/nonlinear/mle.md",
140140
"tutorials/nonlinear/clnlbeam.md",
141+
"tutorials/nonlinear/user_defined_hessians.md",
141142
"tutorials/nonlinear/querying_hessians.md",
142143
],
143144
"Conic programs" => [

docs/src/manual/nlp.md

+42-9
Original file line numberDiff line numberDiff line change
@@ -356,12 +356,6 @@ The above code creates a JuMP model with the objective function
356356
However, it's more readable if it does. Make sure you use `my_f`
357357
and not `f` in the macros.
358358

359-
!!! warning
360-
If you use multi-variate user-defined functions, JuMP will disable
361-
second-derivative information. This can lead to significant slow-downs in
362-
some cases. Only use a user-defined function if you cannot write out the
363-
expression algebraically in the macro.
364-
365359
!!! warning
366360
User-defined functions cannot be re-registered and will not update if you
367361
modify the underlying Julia function. If you want to change a user-defined
@@ -417,9 +411,11 @@ register(model, :my_square, 2, f, ∇f)
417411

418412
### Register a function, gradient, and hessian
419413

420-
!!! warning
421-
The ability to explicitly register a hessian is only available for
422-
univariate functions.
414+
You can also register a function with the second-order derivative information,
415+
which is a scalar for univariate functions, and a symmetric matrix for
416+
multivariate functions.
417+
418+
#### Univariate functions
423419

424420
Instead of automatically differentiating the hessian, you can instead pass a
425421
function which returns a number representing the second-order derivative.
@@ -435,6 +431,43 @@ register(model, :my_square, 1, f, ∇f, ∇²f)
435431
@NLobjective(model, Min, my_square(x))
436432
```
437433

434+
#### Multivariate functions
435+
436+
For multivariate functions, the hessian function `∇²f` must take an
437+
`AbstractMatrix` as the first argument, the lower-triangular of which is filled
438+
in-place:
439+
```@example
440+
using JuMP #hide
441+
f(x...) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
442+
function ∇f(g, x...)
443+
g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2
444+
g[2] = 200 * (x[2] - x[1]^2)
445+
return
446+
end
447+
function ∇²f(H, x...)
448+
H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2
449+
# H[1, 2] = -400 * x[1] <-- Not needed. Fill the lower-triangular only.
450+
H[2, 1] = -400 * x[1]
451+
H[2, 2] = 200.0
452+
return
453+
end
454+
455+
model = Model()
456+
register(model, :rosenbrock, 2, f, ∇f, ∇²f)
457+
@variable(model, x[1:2])
458+
@NLobjective(model, Min, rosenbrock(x[1], x[2]))
459+
```
460+
461+
!!! warning
462+
You may assume the Hessian matrix `H` is initialized with zeros, and because
463+
`H` is symmetric, you need only to fill in the non-zero of the
464+
lower-triangular terms. The matrix type passed in as `H` depends on the
465+
automatic differentiation system, so make sure the first argument to the
466+
Hessian function supports an `AbstractMatrix` (it may be something other
467+
than `Matrix{Float64}`). However, you may assume only that `H` supports
468+
`size(H)` and `setindex!`. Finally, the matrix is treated as dense, so the
469+
performance will be poor on functions with high-dimensional input.
470+
438471
### User-defined functions with vector inputs
439472

440473
User-defined functions which take vectors as input arguments (for example,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Copyright (c) 2022 Oscar Dowson and contributors #src
2+
# #src
3+
# Permission is hereby granted, free of charge, to any person obtaining a copy #src
4+
# of this software and associated documentation files (the "Software"), to deal #src
5+
# in the Software without restriction, including without limitation the rights #src
6+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell #src
7+
# copies of the Software, and to permit persons to whom the Software is #src
8+
# furnished to do so, subject to the following conditions: #src
9+
# #src
10+
# The above copyright notice and this permission notice shall be included in all #src
11+
# copies or substantial portions of the Software. #src
12+
# #src
13+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR #src
14+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, #src
15+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE #src
16+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER #src
17+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, #src
18+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE #src
19+
# SOFTWARE. #src
20+
21+
# # User-defined Hessians
22+
23+
# In this tutorial, we explain how to write a user-defined function (see
24+
# [User-defined Functions](@ref)) with a Hessian matrix explicitly provided by
25+
# the user.
26+
27+
# This tutorial uses the following packages:
28+
29+
using JuMP
30+
import Ipopt
31+
32+
# ## Rosenbrock example
33+
34+
# As a simple example, we first consider the Rosenbrock function:
35+
36+
rosenbrock(x...) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
37+
38+
# which has the gradient vector:
39+
40+
function ∇rosenbrock(g::AbstractVector, x...)
41+
g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2
42+
g[2] = 200 * (x[2] - x[1]^2)
43+
return
44+
end
45+
46+
# and the Hessian matrix:
47+
48+
function ∇²rosenbrock(H::AbstractMatrix, x...)
49+
H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2
50+
## H[1, 2] = -400 * x[1] <-- not needed because Hessian is symmetric
51+
H[2, 1] = -400 * x[1]
52+
H[2, 2] = 200.0
53+
return
54+
end
55+
56+
# You may assume the Hessian matrix `H` is initialized with zeros, and
57+
# because it is symmetric you need only to fill in the non-zero of the
58+
# lower-triangular terms.
59+
60+
# The matrix type passed in as `H` depends on the automatic differentiation
61+
# system, so make sure the first argument to the Hessian function supports an
62+
# `AbstractMatrix` (it may be something other than `Matrix{Float64}`). However,
63+
# you may assume only that `H` supports `size(H)` and `setindex!`.
64+
65+
# Now that we have the function, its gradient, and its Hessian, we can construct
66+
# a JuMP model, register the function, and use it in a `@NL` macro:
67+
68+
model = Model(Ipopt.Optimizer)
69+
@variable(model, x[1:2])
70+
register(model, :rosenbrock, 2, rosenbrock, ∇rosenbrock, ∇²rosenbrock)
71+
@NLobjective(model, Min, rosenbrock(x[1], x[2]))
72+
optimize!(model)
73+
solution_summary(model; verbose = true)
74+
75+
# ## Bilevel optimization
76+
77+
# User-defined Hessian functions can be useful when solving more complicated
78+
# problems. In the rest of this tutorial, our goal is to solve the bilevel
79+
# optimization problem:
80+
81+
# ```math
82+
# \begin{array}{r l}
83+
# \min\limits_{x,z} & x_1^2 + x_2^2 + z \\
84+
# s.t. & \begin{array}{r l}
85+
# z \ge \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\
86+
# s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25
87+
# \end{array} \\
88+
# & x \ge 0.
89+
# \end{array}
90+
# ```
91+
92+
# This bilevel optimization problem is composed of two nested optimization
93+
# problems. An _upper_ level, involving variables ``x``, and a _lower_ level,
94+
# involving variables ``y``. From the perspective of the lower-level problem,
95+
# the values of ``x`` are fixed parameters, and so the model optimizes ``y``
96+
# given those fixed parameters. Simultaneously, the upper-level problem
97+
# optimizes ``x`` and ``z`` given the response of ``y``.
98+
99+
# ## Decomposition
100+
101+
# There are a few ways to solve this problem, but we are going to use a
102+
# nonlinear decomposition method. The first step is to write a function to
103+
# compute the lower-level problem:
104+
105+
# ```math
106+
# \begin{array}{r l}
107+
# V(x_1, x_2) = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\
108+
# s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25
109+
# \end{array}
110+
# ```
111+
112+
function solve_lower_level(x...)
113+
model = Model(Ipopt.Optimizer)
114+
set_silent(model)
115+
@variable(model, y[1:2])
116+
@NLobjective(
117+
model,
118+
Max,
119+
x[1]^2 * y[1] + x[2]^2 * y[2] - x[1] * y[1]^4 - 2 * x[2] * y[2]^4,
120+
)
121+
@constraint(model, (y[1] - 10)^2 + (y[2] - 10)^2 <= 25)
122+
optimize!(model)
123+
@assert termination_status(model) == LOCALLY_SOLVED
124+
return objective_value(model), value.(y)
125+
end
126+
127+
# The next function takes a value of ``x`` and returns the optimal lower-level
128+
# objective-value and the optimal response ``y``. The reason why we need both
129+
# the objective and the optimal ``y`` will be made clear shortly, but for now
130+
# let us define:
131+
132+
function V(x...)
133+
f, _ = solve_lower_level(x...)
134+
return f
135+
end
136+
137+
# Then, we can substitute ``V`` into our full problem to create:
138+
139+
# ```math
140+
# \begin{array}{r l}
141+
# \min\limits_{x} & x_1^2 + x_2^2 + V(x_1, x_2) \\
142+
# s.t. & x \ge 0.
143+
# \end{array}
144+
# ```
145+
146+
# This looks like a nonlinear optimization problem with a user-defined function
147+
# ``V``! However, because ``V`` solves an optimization problem internally, we
148+
# can't use automatic differentiation to compute the first and second
149+
# derivatives. Instead, we can use JuMP's ability to pass callback functions
150+
# for the gradient and Hessian instead.
151+
152+
# First up, we need to define the gradient of ``V`` with respect to ``x``. In
153+
# general, this may be difficult to compute, but because ``x`` appears only in
154+
# the objective, we can just differentiate the objective function with respect
155+
# to ``x``, giving:
156+
157+
function ∇V(g::AbstractVector, x...)
158+
_, y = solve_lower_level(x...)
159+
g[1] = 2 * x[1] * y[1] - y[1]^4
160+
g[2] = 2 * x[2] * y[2] - 2 * y[2]^4
161+
return
162+
end
163+
164+
# Second, we need to define the Hessian of ``V`` with respect to ``x``. This is
165+
# a symmetric matrix, but in our example only the diagonal elements are
166+
# non-zero:
167+
168+
function ∇²V(H::AbstractMatrix, x...)
169+
_, y = solve_lower_level(x...)
170+
H[1, 1] = 2 * y[1]
171+
H[2, 2] = 2 * y[2]
172+
return
173+
end
174+
175+
# We now have enough to define our bilevel optimization problem:
176+
177+
model = Model(Ipopt.Optimizer)
178+
@variable(model, x[1:2] >= 0)
179+
register(model, :V, 2, V, ∇V, ∇²V)
180+
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
181+
optimize!(model)
182+
solution_summary(model)
183+
184+
# The optimal objective value is:
185+
186+
objective_value(model)
187+
188+
# and the optimal upper-level decision variables ``x`` are:
189+
190+
value.(x)
191+
192+
# To compute the optimal lower-level decision variables, we need to call
193+
# `solve_lower_level` with the optimal upper-level decision variables:
194+
195+
_, y = solve_lower_level(value.(x)...)
196+
y
197+
198+
# ## Improving performance
199+
200+
# Our solution approach works, but it has a performance problem: every time
201+
# we need to compute the value, gradient, or Hessian of ``V``, we have to
202+
# re-solve the lower-level optimization problem! This is wasteful, because we
203+
# will often call the gradient and Hessian at the same point, and so solving the
204+
# problem twice with the same input repeats work unnecessarily.
205+
206+
# We can work around this by using a cache:
207+
208+
mutable struct Cache
209+
x::Any
210+
f::Float64
211+
y::Vector{Float64}
212+
end
213+
214+
# with a function to update the cache if needed:
215+
216+
function _update_if_needed(cache::Cache, x...)
217+
if cache.x !== x
218+
cache.f, cache.y = solve_lower_level(x...)
219+
cache.x = x
220+
end
221+
return
222+
end
223+
224+
# Then, we define cached versions of out three functions which call
225+
# `_updated_if_needed` and return values from the cache.
226+
227+
function cached_f(cache::Cache, x...)
228+
_update_if_needed(cache, x...)
229+
return cache.f
230+
end
231+
232+
function cached_∇f(cache::Cache, g::AbstractVector, x...)
233+
_update_if_needed(cache, x...)
234+
g[1] = 2 * x[1] * cache.y[1] - cache.y[1]^4
235+
g[2] = 2 * x[2] * cache.y[2] - 2 * cache.y[2]^4
236+
return
237+
end
238+
239+
function cached_∇²f(cache::Cache, H::AbstractMatrix, x...)
240+
_update_if_needed(cache, x...)
241+
H[1, 1] = 2 * cache.y[1]
242+
H[2, 2] = 2 * cache.y[2]
243+
return
244+
end
245+
246+
# Now we're ready to setup and solve the upper level optimization problem:
247+
248+
model = Model(Ipopt.Optimizer)
249+
@variable(model, x[1:2] >= 0)
250+
cache = Cache(Float64[], NaN, Float64[])
251+
register(
252+
model,
253+
:V,
254+
2,
255+
(x...) -> cached_f(cache, x...),
256+
(g, x...) -> cached_∇f(cache, g, x...),
257+
(H, x...) -> cached_∇²f(cache, H, x...),
258+
)
259+
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
260+
optimize!(model)
261+
solution_summary(model)
262+
263+
# an we can check we get the same objective value:
264+
265+
objective_value(model)
266+
267+
# and upper-level decision variable ``x``:
268+
269+
value.(x)

src/nlp.jl

-5
Original file line numberDiff line numberDiff line change
@@ -801,11 +801,6 @@ function register(
801801
∇f::Function,
802802
∇²f::Function,
803803
)
804-
if dimension > 1
805-
error(
806-
"Providing hessians for multivariate functions is not yet supported",
807-
)
808-
end
809804
_init_NLP(model)
810805
MOI.Nonlinear.register_operator(model.nlp_model, op, dimension, f, ∇f, ∇²f)
811806
return

0 commit comments

Comments
 (0)