@@ -208,6 +208,7 @@ before we can call [`initialize`](@ref), we need to set an
208
208
There are two to choose from within MOI, although other packages may add more
209
209
options by sub-typing [ ` Nonlinear.AbstractAutomaticDifferentiation ` ] ( @ref ) :
210
210
* [ ` Nonlinear.ExprGraphOnly ` ] ( @ref )
211
+ * [ ` Nonlinear.SparseReverseMode ` ] ( @ref ) .
211
212
212
213
If we set [ ` Nonlinear.ExprGraphOnly ` ] ( @ref ) , then we get access to ` :ExprGraph ` :
213
214
``` jldoctest nonlinear_developer
@@ -229,6 +230,52 @@ However, we cannot call gradient terms such as
229
230
[ ` eval_objective_gradient ` ] ( @ref ) because [ ` Nonlinear.ExprGraphOnly ` ] ( @ref ) does
230
231
not know how to differentiate a nonlinear expression.
231
232
233
+ If, instead, we set [ ` Nonlinear.SparseReverseMode ` ] ( @ref ) , then we get access to
234
+ ` :Grad ` , the gradient of the objective function, ` :Jac ` , the jacobian matrix of
235
+ the constraints, ` :JacVec ` , the ability to compute Jacobian-vector products, and
236
+ ` :ExprGraph ` .
237
+ ``` jldoctest nonlinear_developer
238
+ julia> Nonlinear.set_differentiation_backend(
239
+ data,
240
+ Nonlinear.SparseReverseMode(),
241
+ [x],
242
+ )
243
+
244
+ julia> data
245
+ NonlinearData with available features:
246
+ * :Grad
247
+ * :Jac
248
+ * :JacVec
249
+ * :ExprGraph
250
+ ```
251
+
252
+ However, before calling anything, we need to call [ ` initialize ` ] ( @ref ) :
253
+ ``` jldoctest nonlinear_developer
254
+ julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph])
255
+ ```
256
+
257
+ Now we can call methods like [ ` eval_objective ` ] ( @ref ) :
258
+ ``` jldoctest nonlinear_developer
259
+ julia> x = [1.0]
260
+ 1-element Vector{Float64}:
261
+ 1.0
262
+
263
+ julia> MOI.eval_objective(data, x)
264
+ 7.268073418273571
265
+ ```
266
+ and [ ` eval_objective_gradient ` ] ( @ref ) :
267
+ ``` jldoctest nonlinear_developer
268
+ julia> grad = [NaN]
269
+ 1-element Vector{Float64}:
270
+ NaN
271
+
272
+ julia> MOI.eval_objective_gradient(data, grad, x)
273
+
274
+ julia> grad
275
+ 1-element Vector{Float64}:
276
+ 1.909297426825682
277
+ ```
278
+
232
279
## Expression-graph representation
233
280
234
281
[ ` Nonlinear.NonlinearData ` ] ( @ref ) stores nonlinear expressions in
@@ -388,3 +435,61 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
388
435
[ ` Nonlinear.NonlinearData ` ] ( @ref ) is a struct that stores the
389
436
[ ` Nonlinear.OperatorRegistry ` ] ( @ref ) , as well as a list of parameters and
390
437
subexpressions in the model.
438
+
439
+ ## ReverseAD
440
+
441
+ ` Nonlinear.ReverseAD ` is a submodule for computing derivatives of the problem
442
+ inside [ ` Nonlinear.NonlinearData ` ] ( @ref ) using sparse reverse-mode automatic
443
+ differentiation (AD).
444
+
445
+ This section does not attempt to explain how sparse reverse-mode AD works, but
446
+ instead explains why MOI contains it's own implementation, and highlights
447
+ notable differences from similar packages.
448
+
449
+ !!! warning
450
+ You should not interact with ` ReverseAD ` directly. Instead, you should
451
+ create a [ ` Nonlinear.NonlinearData ` ] ( @ref ) object, call
452
+ [ ` Nonlinear.set_differentiation_backend ` ] ( @ref ) with
453
+ [ ` Nonlinear.SparseReverseMode ` ] ( @ref ) , and then query the MOI API methods.
454
+
455
+ ### Why another AD package?
456
+
457
+ The JuliaDiff organization maintains a [ list of packages] ( https://juliadiff.org )
458
+ for doing AD in Julia. At last count, there were at least ten packages–not
459
+ including ` ReverseAD ` –for reverse-mode AD in Julia. Given this multitude, why
460
+ does MOI maintain another implementation instead of re-using existing tooling?
461
+
462
+ Here are four reasons:
463
+
464
+ * ** Scale and Sparsity:** the types of functions that MOI computes derivatives
465
+ of have two key characteristics: they can be very large scale (10^5 or more
466
+ functions across 10^5 or more variables) and they are very sparse. For large
467
+ problems, it is common for the hessian to have ` O(n) ` non-zero elements
468
+ instead of ` O(n^2) ` if it was dense. To the best of our knowledge,
469
+ ` ReverseAD ` is the only reverse-mode AD system in Julia that handles sparsity
470
+ by default. The lack of sparsity support is _ the_ main reason why we do no
471
+ use a generic package.
472
+ * ** Limited scope:** most other AD packages accept arbitrary Julia functions as
473
+ input and then trace an expression graph using operator overloading. This
474
+ means they must deal (or detect and ignore) with control flow, I/O, and other
475
+ vagaries of Julia. In contrast, ` ReverseAD ` only accepts functions in the
476
+ form of [ ` Nonlinear.NonlinearExpression ` ] ( @ref ) , which greatly limits the
477
+ range of syntax that it must deal with. By reducing the scope of what we
478
+ accept as input to functions relevant for mathematical optimization, we can
479
+ provide a simpler implementation with various performance optimizations.
480
+ * ** Historical:** ` ReverseAD ` started life as [ ReverseDiffSparse.jl] ( https://github.com/mlubin/ReverseDiffSparse.jl ) ,
481
+ development of which begain in early 2014(!). This was well before the other
482
+ packages started development. Because we had a well-tested, working AD in
483
+ JuMP, there was less motivation to contribute to and explore other AD
484
+ packages. The lack of historical interaction also meant that other packages
485
+ were not optimized for the types of problems that JuMP is built for (i.e.,
486
+ large-scale sparse problems). When we first created MathOptInterface, we kept
487
+ the AD in JuMP to simplify the transition, and post-poned the development of
488
+ a first-class nonlinear interface in MathOptInterface.
489
+ * ** Technical debt** Prior to the introduction of ` Nonlinear ` , JuMP's nonlinear
490
+ implementation was a confusing mix of functions and types spread across the
491
+ code base and in the private ` _Derivatives ` submodule. This made it hard to
492
+ swap the AD system for another. The main motivation for refactoring JuMP to
493
+ create the ` Nonlinear ` submodule in MathOptInterface was to abstract the
494
+ interface between JuMP and the AD system, allowing us to swap-in and test new
495
+ AD systems in the future.
0 commit comments