-
Notifications
You must be signed in to change notification settings - Fork 92
Add [Nonlinear.ReverseAD] submodule #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
907080a
e7b4f4b
b20e285
020a92f
f0ce71f
27a4696
5babfa7
36c96ab
aa3d326
f125e16
db384d9
8420fb7
b0b85f4
bbd3665
72f1218
3d8e547
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other | |
packages may add more options by sub-typing | ||
[`Nonlinear.AbstractAutomaticDifferentiation`](@ref): | ||
* [`Nonlinear.ExprGraphOnly`](@ref) | ||
* [`Nonlinear.SparseReverseMode`](@ref). | ||
|
||
```jldoctest nonlinear_developer | ||
julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x]) | ||
|
@@ -301,6 +302,50 @@ However, we cannot call gradient terms such as | |
[`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does | ||
not have the capability to differentiate a nonlinear expression. | ||
|
||
If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access | ||
to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix | ||
of the constraints, `:JacVec`, the ability to compute Jacobian-vector products, | ||
and `:ExprGraph`. | ||
```jldoctest nonlinear_developer | ||
julia> evaluator = Nonlinear.Evaluator( | ||
model, | ||
Nonlinear.SparseReverseMode(), | ||
[x], | ||
) | ||
Nonlinear.Evaluator with available features: | ||
* :Grad | ||
* :Jac | ||
* :JacVec | ||
* :ExprGraph | ||
``` | ||
|
||
However, before calling anything, we need to call [`initialize`](@ref): | ||
odow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```jldoctest nonlinear_developer | ||
julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph]) | ||
``` | ||
|
||
Now we can call methods like [`eval_objective`](@ref): | ||
```jldoctest nonlinear_developer | ||
julia> x = [1.0] | ||
1-element Vector{Float64}: | ||
1.0 | ||
|
||
julia> MOI.eval_objective(evaluator, x) | ||
7.268073418273571 | ||
``` | ||
and [`eval_objective_gradient`](@ref): | ||
```jldoctest nonlinear_developer | ||
julia> grad = [NaN] | ||
odow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1-element Vector{Float64}: | ||
NaN | ||
|
||
julia> MOI.eval_objective_gradient(evaluator, grad, x) | ||
|
||
julia> grad | ||
1-element Vector{Float64}: | ||
1.909297426825682 | ||
``` | ||
|
||
Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers, | ||
solvers query the [`NLPBlock`](@ref) attribute, which returns an | ||
[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and | ||
|
@@ -317,11 +362,33 @@ julia> MOI.set(model, MOI.NLPBlock(), block); | |
Only call [`NLPBlockData`](@ref) once you have finished modifying the | ||
problem in `model`. | ||
|
||
Putting everything together, you can create a nonlinear optimization problem in | ||
MathOptInterface as follows: | ||
```@example | ||
import MathOptInterface | ||
const MOI = MathOptInterface | ||
|
||
function build_model(model; backend) | ||
odow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
x = MOI.add_variable(model) | ||
y = MOI.add_variable(model) | ||
MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE) | ||
nl_model = MOI.Nonlinear.Model() | ||
MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2)) | ||
evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y]) | ||
MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator)) | ||
return | ||
end | ||
|
||
# Replace `model` and `backend` with your optimizer and backend of choice. | ||
model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()) | ||
build_model(model; backend = MOI.Nonlinear.SparseReverseMode()) | ||
``` | ||
|
||
## Expression-graph representation | ||
|
||
[`Nonlinear.Model`](@ref) stores nonlinear expressions in | ||
[`Nonlinear.Expression`](@ref)s. This section explains the design of | ||
the expression graph datastructure in [`Nonlinear.Expression`](@ref). | ||
the expression graph data structure in [`Nonlinear.Expression`](@ref). | ||
|
||
Given a nonlinear function like `f(x) = sin(x)^2 + x`, a conceptual aid for | ||
thinking about the graph representation of the expression is to convert it into | ||
|
@@ -344,7 +411,7 @@ julia> struct ExprNode | |
julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]); | ||
``` | ||
|
||
This datastructure follows our Polish prefix notation very closely, and we can | ||
This data structure follows our Polish prefix notation very closely, and we can | ||
easily identify the arguments to an operator. However, it has a significant | ||
draw-back: each node in the graph requires a `Vector`, which is heap-allocated | ||
and tracked by Julia's garbage collector (GC). For large models, we can expect | ||
|
@@ -380,7 +447,7 @@ challenges: | |
|
||
### Sketch of the design in Nonlinear | ||
|
||
`Nonlinear` overcomes these problems by decomposing the datastructure into a | ||
`Nonlinear` overcomes these problems by decomposing the data structure into a | ||
number of different concrete-typed vectors. | ||
|
||
First, we create vectors of the supported uni- and multivariate operators. | ||
|
@@ -443,7 +510,7 @@ julia> expr = Expression( | |
); | ||
``` | ||
|
||
This is less readable than the other options, but does this datastructure meet | ||
This is less readable than the other options, but does this data structure meet | ||
our design goals? | ||
|
||
Instead of a heap-allocated object for each node, we only have two `Vector`s for | ||
|
@@ -477,3 +544,59 @@ user-defined functions using [`Nonlinear.register_operator`](@ref). | |
[`Nonlinear.Model`](@ref) is a struct that stores the | ||
[`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and | ||
subexpressions in the model. | ||
|
||
## ReverseAD | ||
|
||
`Nonlinear.ReverseAD` is a submodule for computing derivatives of a nonlinear | ||
optimization problem using sparse reverse-mode automatic differentiation (AD). | ||
|
||
This section does not attempt to explain how sparse reverse-mode AD works, but | ||
instead explains why MOI contains it's own implementation, and highlights | ||
odow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
notable differences from similar packages. | ||
|
||
!!! warning | ||
You should not interact with `ReverseAD` directly. Instead, you should | ||
odow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
create a [`Nonlinear.Evaluator`](@ref) object with | ||
[`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods. | ||
|
||
### Why another AD package? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I reject the premise of the question because this code was one of the first implementations of reverse-mode AD in Julia. This discussion as worded doesn't really fit as package documentation. What would fit is a discussion of the design goals of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How's this now? I only made minor changes to split the history and identify design goals. But can make larger changes if you want? |
||
|
||
The JuliaDiff organization maintains a [list of packages](https://juliadiff.org) | ||
for doing AD in Julia. At last count, there were at least ten packages–not | ||
including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why | ||
does MOI maintain another implementation instead of re-using existing tooling? | ||
|
||
Here are four reasons: | ||
|
||
* **Scale and Sparsity:** the types of functions that MOI computes derivatives | ||
of have two key characteristics: they can be very large scale (10^5 or more | ||
functions across 10^5 or more variables) and they are very sparse. For large | ||
problems, it is common for the hessian to have `O(n)` non-zero elements | ||
instead of `O(n^2)` if it was dense. To the best of our knowledge, | ||
`ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity | ||
by default. The lack of sparsity support is _the_ main reason why we do no | ||
use a generic package. | ||
* **Limited scope:** most other AD packages accept arbitrary Julia functions as | ||
input and then trace an expression graph using operator overloading. This | ||
means they must deal (or detect and ignore) with control flow, I/O, and other | ||
vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the | ||
form of [`Nonlinear.Expression`](@ref), which greatly limits the range of | ||
syntax that it must deal with. By reducing the scope of what we accept as | ||
input to functions relevant for mathematical optimization, we can provide a | ||
simpler implementation with various performance optimizations. | ||
* **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl), | ||
development of which begain in early 2014(!). This was well before the other | ||
packages started development. Because we had a well-tested, working AD in | ||
JuMP, there was less motivation to contribute to and explore other AD | ||
packages. The lack of historical interaction also meant that other packages | ||
were not optimized for the types of problems that JuMP is built for (i.e., | ||
large-scale sparse problems). When we first created MathOptInterface, we kept | ||
the AD in JuMP to simplify the transition, and post-poned the development of | ||
a first-class nonlinear interface in MathOptInterface. | ||
* **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear | ||
implementation was a confusing mix of functions and types spread across the | ||
code base and in the private `_Derivatives` submodule. This made it hard to | ||
swap the AD system for another. The main motivation for refactoring JuMP to | ||
create the `Nonlinear` submodule in MathOptInterface was to abstract the | ||
interface between JuMP and the AD system, allowing us to swap-in and test new | ||
AD systems in the future. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,4 +50,6 @@ include("parse.jl") | |
include("model.jl") | ||
include("evaluator.jl") | ||
|
||
include("ReverseAD/ReverseAD.jl") | ||
|
||
end # module |
Uh oh!
There was an error while loading. Please reload this page.