Skip to content

Add [Nonlinear.ReverseAD] submodule #1803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 5, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ version = "1.2.0"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CodecBzip2 = "523fee87-0ab8-5b00-afb7-3ecf72e48cfd"
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Expand All @@ -22,6 +23,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
BenchmarkTools = "1"
CodecBzip2 = "~0.6, 0.7"
CodecZlib = "~0.6, 0.7"
DataStructures = "0.18"
ForwardDiff = "0.5, 0.6, 0.7, 0.8, 0.9, 0.10"
JSON = "~0.21"
JSONSchema = "1"
Expand Down
131 changes: 127 additions & 4 deletions docs/src/submodules/Nonlinear/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other
packages may add more options by sub-typing
[`Nonlinear.AbstractAutomaticDifferentiation`](@ref):
* [`Nonlinear.ExprGraphOnly`](@ref)
* [`Nonlinear.SparseReverseMode`](@ref).

```jldoctest nonlinear_developer
julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
Expand All @@ -301,6 +302,50 @@ However, we cannot call gradient terms such as
[`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does
not have the capability to differentiate a nonlinear expression.

If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access
to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix
of the constraints, `:JacVec`, the ability to compute Jacobian-vector products,
and `:ExprGraph`.
```jldoctest nonlinear_developer
julia> evaluator = Nonlinear.Evaluator(
model,
Nonlinear.SparseReverseMode(),
[x],
)
Nonlinear.Evaluator with available features:
* :Grad
* :Jac
* :JacVec
* :ExprGraph
```

However, before calling anything, we need to call [`initialize`](@ref):
```jldoctest nonlinear_developer
julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
```

Now we can call methods like [`eval_objective`](@ref):
```jldoctest nonlinear_developer
julia> x = [1.0]
1-element Vector{Float64}:
1.0

julia> MOI.eval_objective(evaluator, x)
7.268073418273571
```
and [`eval_objective_gradient`](@ref):
```jldoctest nonlinear_developer
julia> grad = [NaN]
1-element Vector{Float64}:
NaN

julia> MOI.eval_objective_gradient(evaluator, grad, x)

julia> grad
1-element Vector{Float64}:
1.909297426825682
```

Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers,
solvers query the [`NLPBlock`](@ref) attribute, which returns an
[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and
Expand All @@ -317,11 +362,33 @@ julia> MOI.set(model, MOI.NLPBlock(), block);
Only call [`NLPBlockData`](@ref) once you have finished modifying the
problem in `model`.

Putting everything together, you can create a nonlinear optimization problem in
MathOptInterface as follows:
```@example
import MathOptInterface
const MOI = MathOptInterface

function build_model(model; backend)
x = MOI.add_variable(model)
y = MOI.add_variable(model)
MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
nl_model = MOI.Nonlinear.Model()
MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
return
end

# Replace `model` and `backend` with your optimizer and backend of choice.
model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
```

## Expression-graph representation

[`Nonlinear.Model`](@ref) stores nonlinear expressions in
[`Nonlinear.Expression`](@ref)s. This section explains the design of
the expression graph datastructure in [`Nonlinear.Expression`](@ref).
the expression graph data structure in [`Nonlinear.Expression`](@ref).

Given a nonlinear function like `f(x) = sin(x)^2 + x`, a conceptual aid for
thinking about the graph representation of the expression is to convert it into
Expand All @@ -344,7 +411,7 @@ julia> struct ExprNode
julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);
```

This datastructure follows our Polish prefix notation very closely, and we can
This data structure follows our Polish prefix notation very closely, and we can
easily identify the arguments to an operator. However, it has a significant
draw-back: each node in the graph requires a `Vector`, which is heap-allocated
and tracked by Julia's garbage collector (GC). For large models, we can expect
Expand Down Expand Up @@ -380,7 +447,7 @@ challenges:

### Sketch of the design in Nonlinear

`Nonlinear` overcomes these problems by decomposing the datastructure into a
`Nonlinear` overcomes these problems by decomposing the data structure into a
number of different concrete-typed vectors.

First, we create vectors of the supported uni- and multivariate operators.
Expand Down Expand Up @@ -443,7 +510,7 @@ julia> expr = Expression(
);
```

This is less readable than the other options, but does this datastructure meet
This is less readable than the other options, but does this data structure meet
our design goals?

Instead of a heap-allocated object for each node, we only have two `Vector`s for
Expand Down Expand Up @@ -477,3 +544,59 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
[`Nonlinear.Model`](@ref) is a struct that stores the
[`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and
subexpressions in the model.

## ReverseAD

`Nonlinear.ReverseAD` is a submodule for computing derivatives of a nonlinear
optimization problem using sparse reverse-mode automatic differentiation (AD).

This section does not attempt to explain how sparse reverse-mode AD works, but
instead explains why MOI contains it's own implementation, and highlights
notable differences from similar packages.

!!! warning
You should not interact with `ReverseAD` directly. Instead, you should
create a [`Nonlinear.Evaluator`](@ref) object with
[`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods.

### Why another AD package?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reject the premise of the question because this code was one of the first implementations of reverse-mode AD in Julia. This discussion as worded doesn't really fit as package documentation. What would fit is a discussion of the design goals of ReverseAD. It's fair to mention the history of the code as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this now? I only made minor changes to split the history and identify design goals. But can make larger changes if you want?


The JuliaDiff organization maintains a [list of packages](https://juliadiff.org)
for doing AD in Julia. At last count, there were at least ten packages–not
including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why
does MOI maintain another implementation instead of re-using existing tooling?

Here are four reasons:

* **Scale and Sparsity:** the types of functions that MOI computes derivatives
of have two key characteristics: they can be very large scale (10^5 or more
functions across 10^5 or more variables) and they are very sparse. For large
problems, it is common for the hessian to have `O(n)` non-zero elements
instead of `O(n^2)` if it was dense. To the best of our knowledge,
`ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity
by default. The lack of sparsity support is _the_ main reason why we do no
use a generic package.
* **Limited scope:** most other AD packages accept arbitrary Julia functions as
input and then trace an expression graph using operator overloading. This
means they must deal (or detect and ignore) with control flow, I/O, and other
vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the
form of [`Nonlinear.Expression`](@ref), which greatly limits the range of
syntax that it must deal with. By reducing the scope of what we accept as
input to functions relevant for mathematical optimization, we can provide a
simpler implementation with various performance optimizations.
* **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
development of which begain in early 2014(!). This was well before the other
packages started development. Because we had a well-tested, working AD in
JuMP, there was less motivation to contribute to and explore other AD
packages. The lack of historical interaction also meant that other packages
were not optimized for the types of problems that JuMP is built for (i.e.,
large-scale sparse problems). When we first created MathOptInterface, we kept
the AD in JuMP to simplify the transition, and post-poned the development of
a first-class nonlinear interface in MathOptInterface.
* **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear
implementation was a confusing mix of functions and types spread across the
code base and in the private `_Derivatives` submodule. This made it hard to
swap the AD system for another. The main motivation for refactoring JuMP to
create the `Nonlinear` submodule in MathOptInterface was to abstract the
interface between JuMP and the AD system, allowing us to swap-in and test new
AD systems in the future.
1 change: 1 addition & 0 deletions docs/src/submodules/Nonlinear/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Nonlinear.eval_comparison_function
Nonlinear.Evaluator
Nonlinear.AbstractAutomaticDifferentiation
Nonlinear.ExprGraphOnly
Nonlinear.SparseReverseMode
```

## Data-structure
Expand Down
2 changes: 2 additions & 0 deletions src/Nonlinear/Nonlinear.jl
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,6 @@ include("parse.jl")
include("model.jl")
include("evaluator.jl")

include("ReverseAD/ReverseAD.jl")

end # module
Loading