Skip to content

Commit 1ab5078

Browse files
authored
Add [Nonlinear.ReverseAD] submodule (#1803)
The majority of this development was carried out in the JuMP PRs: * jump-dev/JuMP.jl#2939 * jump-dev/JuMP.jl#2942 * jump-dev/JuMP.jl#2943 Nonlinear.ReverseAD is a minor refactoring of code that previously existed in JuMP under the _Derivatives submodule, and prior to that the ReverseDiffSparse.jl package.
1 parent bc7b156 commit 1ab5078

File tree

17 files changed

+3632
-4
lines changed

17 files changed

+3632
-4
lines changed

Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ version = "1.2.0"
66
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
77
CodecBzip2 = "523fee87-0ab8-5b00-afb7-3ecf72e48cfd"
88
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
9+
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
910
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
1011
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
1112
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
@@ -22,6 +23,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
2223
BenchmarkTools = "1"
2324
CodecBzip2 = "~0.6, 0.7"
2425
CodecZlib = "~0.6, 0.7"
26+
DataStructures = "0.18"
2527
ForwardDiff = "0.5, 0.6, 0.7, 0.8, 0.9, 0.10"
2628
JSON = "~0.21"
2729
JSONSchema = "1"

docs/src/submodules/Nonlinear/overview.md

Lines changed: 136 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other
285285
packages may add more options by sub-typing
286286
[`Nonlinear.AbstractAutomaticDifferentiation`](@ref):
287287
* [`Nonlinear.ExprGraphOnly`](@ref)
288+
* [`Nonlinear.SparseReverseMode`](@ref).
288289

289290
```jldoctest nonlinear_developer
290291
julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
@@ -301,6 +302,50 @@ However, we cannot call gradient terms such as
301302
[`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does
302303
not have the capability to differentiate a nonlinear expression.
303304

305+
If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access
306+
to `:Grad`, the gradient of the objective function, `:Jac`, the Jacobian matrix
307+
of the constraints, `:JacVec`, the ability to compute Jacobian-vector products,
308+
and `:ExprGraph`.
309+
```jldoctest nonlinear_developer
310+
julia> evaluator = Nonlinear.Evaluator(
311+
model,
312+
Nonlinear.SparseReverseMode(),
313+
[x],
314+
)
315+
Nonlinear.Evaluator with available features:
316+
* :Grad
317+
* :Jac
318+
* :JacVec
319+
* :ExprGraph
320+
```
321+
322+
However, before using the evaluator, we need to call [`initialize`](@ref):
323+
```jldoctest nonlinear_developer
324+
julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
325+
```
326+
327+
Now we can call methods like [`eval_objective`](@ref):
328+
```jldoctest nonlinear_developer
329+
julia> x = [1.0]
330+
1-element Vector{Float64}:
331+
1.0
332+
333+
julia> MOI.eval_objective(evaluator, x)
334+
7.268073418273571
335+
```
336+
and [`eval_objective_gradient`](@ref):
337+
```jldoctest nonlinear_developer
338+
julia> grad = [0.0]
339+
1-element Vector{Float64}:
340+
0.0
341+
342+
julia> MOI.eval_objective_gradient(evaluator, grad, x)
343+
344+
julia> grad
345+
1-element Vector{Float64}:
346+
1.909297426825682
347+
```
348+
304349
Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers,
305350
solvers query the [`NLPBlock`](@ref) attribute, which returns an
306351
[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and
@@ -317,11 +362,36 @@ julia> MOI.set(model, MOI.NLPBlock(), block);
317362
Only call [`NLPBlockData`](@ref) once you have finished modifying the
318363
problem in `model`.
319364

365+
Putting everything together, you can create a nonlinear optimization problem in
366+
MathOptInterface as follows:
367+
```@example
368+
import MathOptInterface
369+
const MOI = MathOptInterface
370+
371+
function build_model(
372+
model::MOI.ModelLike;
373+
backend::MOI.Nonlinear.AbstractAutomaticDifferentiation,
374+
)
375+
x = MOI.add_variable(model)
376+
y = MOI.add_variable(model)
377+
MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
378+
nl_model = MOI.Nonlinear.Model()
379+
MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
380+
evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
381+
MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
382+
return
383+
end
384+
385+
# Replace `model` and `backend` with your optimizer and backend of choice.
386+
model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
387+
build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
388+
```
389+
320390
## Expression-graph representation
321391

322392
[`Nonlinear.Model`](@ref) stores nonlinear expressions in
323393
[`Nonlinear.Expression`](@ref)s. This section explains the design of
324-
the expression graph datastructure in [`Nonlinear.Expression`](@ref).
394+
the expression graph data structure in [`Nonlinear.Expression`](@ref).
325395

326396
Given a nonlinear function like `f(x) = sin(x)^2 + x`, a conceptual aid for
327397
thinking about the graph representation of the expression is to convert it into
@@ -344,7 +414,7 @@ julia> struct ExprNode
344414
julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);
345415
```
346416

347-
This datastructure follows our Polish prefix notation very closely, and we can
417+
This data structure follows our Polish prefix notation very closely, and we can
348418
easily identify the arguments to an operator. However, it has a significant
349419
draw-back: each node in the graph requires a `Vector`, which is heap-allocated
350420
and tracked by Julia's garbage collector (GC). For large models, we can expect
@@ -380,7 +450,7 @@ challenges:
380450

381451
### Sketch of the design in Nonlinear
382452

383-
`Nonlinear` overcomes these problems by decomposing the datastructure into a
453+
`Nonlinear` overcomes these problems by decomposing the data structure into a
384454
number of different concrete-typed vectors.
385455

386456
First, we create vectors of the supported uni- and multivariate operators.
@@ -443,7 +513,7 @@ julia> expr = Expression(
443513
);
444514
```
445515

446-
This is less readable than the other options, but does this datastructure meet
516+
This is less readable than the other options, but does this data structure meet
447517
our design goals?
448518

449519
Instead of a heap-allocated object for each node, we only have two `Vector`s for
@@ -477,3 +547,65 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
477547
[`Nonlinear.Model`](@ref) is a struct that stores the
478548
[`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and
479549
subexpressions in the model.
550+
551+
## ReverseAD
552+
553+
`Nonlinear.ReverseAD` is a submodule for computing derivatives of a nonlinear
554+
optimization problem using sparse reverse-mode automatic differentiation (AD).
555+
556+
This section does not attempt to explain how sparse reverse-mode AD works, but
557+
instead explains why MOI contains its own implementation, and highlights
558+
notable differences from similar packages.
559+
560+
!!! warning
561+
Don't use the API in `ReverseAD` to compute derivatives. Instead, create a
562+
[`Nonlinear.Evaluator`](@ref) object with [`Nonlinear.SparseReverseMode`](@ref)
563+
as the backend, and then query the MOI API methods.
564+
565+
### Design goals
566+
567+
The JuliaDiff organization maintains a [list of packages](https://juliadiff.org)
568+
for doing AD in Julia. At last count, there were at least ten packages–-not
569+
including `ReverseAD`-–for reverse-mode AD in Julia. `ReverseAD` exists because
570+
it has a different set of design goals.
571+
572+
* **Goal: handle scale and sparsity**
573+
The types of nonlinear optimization problems that MOI represents can be large
574+
scale (10^5 or more functions across 10^5 or more variables) with very sparse
575+
derivatives. The ability to compute a sparse Hessian matrix is essential. To
576+
the best of our knowledge, `ReverseAD` is the only reverse-mode AD system in
577+
Julia that handles sparsity by default.
578+
* **Goal: limit the scope to improve robustness**
579+
Most other AD packages accept arbitrary Julia functions as input and then
580+
trace an expression graph using operator overloading. This means they must
581+
deal (or detect and ignore) with control flow, I/O, and other vagaries of
582+
Julia. In contrast, `ReverseAD` only accepts functions in the form of
583+
[`Nonlinear.Expression`](@ref), which greatly limits the range of syntax that
584+
it must deal with. By reducing the scope of what we accept as input to
585+
functions relevant for mathematical optimization, we can provide a simpler
586+
implementation with various performance optimizations.
587+
* **Goal: provide outputs which match what solvers expect**
588+
Other AD packages focus on differentiating individual Julia functions. In
589+
constrast, `ReverseAD` has a very specific use-case: to generate outputs
590+
needed by the MOI nonlinear API. This means it needs to efficiently compute
591+
sparse Hessians, and it needs subexpression handling to avoid recomputing
592+
subexpressions that are shared between functions.
593+
594+
### History
595+
596+
`ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
597+
development of which begain in early 2014(!). This was well before the other
598+
packages started development. Because we had a well-tested, working AD in JuMP,
599+
there was less motivation to contribute to and explore other AD packages. The
600+
lack of historical interaction also meant that other packages were not optimized
601+
for the types of problems that JuMP is built for (i.e., large-scale sparse
602+
problems). When we first created MathOptInterface, we kept the AD in JuMP to
603+
simplify the transition, and post-poned the development of a first-class
604+
nonlinear interface in MathOptInterface.
605+
606+
Prior to the introduction of `Nonlinear`, JuMP's nonlinear implementation was a
607+
confusing mix of functions and types spread across the code base and in the
608+
private `_Derivatives` submodule. This made it hard to swap the AD system for
609+
another. The main motivation for refactoring JuMP to create the `Nonlinear`
610+
submodule in MathOptInterface was to abstract the interface between JuMP and the
611+
AD system, allowing us to swap-in and test new AD systems in the future.

docs/src/submodules/Nonlinear/reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Nonlinear.eval_comparison_function
7070
Nonlinear.Evaluator
7171
Nonlinear.AbstractAutomaticDifferentiation
7272
Nonlinear.ExprGraphOnly
73+
Nonlinear.SparseReverseMode
7374
```
7475

7576
## Data-structure

src/Nonlinear/Nonlinear.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,4 +50,6 @@ include("parse.jl")
5050
include("model.jl")
5151
include("evaluator.jl")
5252

53+
include("ReverseAD/ReverseAD.jl")
54+
5355
end # module

0 commit comments

Comments
 (0)