@@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other
285
285
packages may add more options by sub-typing
286
286
[ ` Nonlinear.AbstractAutomaticDifferentiation ` ] ( @ref ) :
287
287
* [ ` Nonlinear.ExprGraphOnly ` ] ( @ref )
288
+ * [ ` Nonlinear.SparseReverseMode ` ] ( @ref ) .
288
289
289
290
``` jldoctest nonlinear_developer
290
291
julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
@@ -301,6 +302,50 @@ However, we cannot call gradient terms such as
301
302
[ ` eval_objective_gradient ` ] ( @ref ) because [ ` Nonlinear.ExprGraphOnly ` ] ( @ref ) does
302
303
not have the capability to differentiate a nonlinear expression.
303
304
305
+ If, instead, we pass [ ` Nonlinear.SparseReverseMode ` ] ( @ref ) , then we get access
306
+ to ` :Grad ` , the gradient of the objective function, ` :Jac ` , the Jacobian matrix
307
+ of the constraints, ` :JacVec ` , the ability to compute Jacobian-vector products,
308
+ and ` :ExprGraph ` .
309
+ ``` jldoctest nonlinear_developer
310
+ julia> evaluator = Nonlinear.Evaluator(
311
+ model,
312
+ Nonlinear.SparseReverseMode(),
313
+ [x],
314
+ )
315
+ Nonlinear.Evaluator with available features:
316
+ * :Grad
317
+ * :Jac
318
+ * :JacVec
319
+ * :ExprGraph
320
+ ```
321
+
322
+ However, before using the evaluator, we need to call [ ` initialize ` ] ( @ref ) :
323
+ ``` jldoctest nonlinear_developer
324
+ julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
325
+ ```
326
+
327
+ Now we can call methods like [ ` eval_objective ` ] ( @ref ) :
328
+ ``` jldoctest nonlinear_developer
329
+ julia> x = [1.0]
330
+ 1-element Vector{Float64}:
331
+ 1.0
332
+
333
+ julia> MOI.eval_objective(evaluator, x)
334
+ 7.268073418273571
335
+ ```
336
+ and [ ` eval_objective_gradient ` ] ( @ref ) :
337
+ ``` jldoctest nonlinear_developer
338
+ julia> grad = [0.0]
339
+ 1-element Vector{Float64}:
340
+ 0.0
341
+
342
+ julia> MOI.eval_objective_gradient(evaluator, grad, x)
343
+
344
+ julia> grad
345
+ 1-element Vector{Float64}:
346
+ 1.909297426825682
347
+ ```
348
+
304
349
Instead of passing [ ` Nonlinear.Evaluator ` ] ( @ref ) directly to solvers,
305
350
solvers query the [ ` NLPBlock ` ] ( @ref ) attribute, which returns an
306
351
[ ` NLPBlockData ` ] ( @ref ) . This object wraps an [ ` Nonlinear.Evaluator ` ] ( @ref ) and
@@ -317,11 +362,36 @@ julia> MOI.set(model, MOI.NLPBlock(), block);
317
362
Only call [ ` NLPBlockData ` ] ( @ref ) once you have finished modifying the
318
363
problem in ` model ` .
319
364
365
+ Putting everything together, you can create a nonlinear optimization problem in
366
+ MathOptInterface as follows:
367
+ ``` @example
368
+ import MathOptInterface
369
+ const MOI = MathOptInterface
370
+
371
+ function build_model(
372
+ model::MOI.ModelLike;
373
+ backend::MOI.Nonlinear.AbstractAutomaticDifferentiation,
374
+ )
375
+ x = MOI.add_variable(model)
376
+ y = MOI.add_variable(model)
377
+ MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
378
+ nl_model = MOI.Nonlinear.Model()
379
+ MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
380
+ evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
381
+ MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
382
+ return
383
+ end
384
+
385
+ # Replace `model` and `backend` with your optimizer and backend of choice.
386
+ model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
387
+ build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
388
+ ```
389
+
320
390
## Expression-graph representation
321
391
322
392
[ ` Nonlinear.Model ` ] ( @ref ) stores nonlinear expressions in
323
393
[ ` Nonlinear.Expression ` ] ( @ref ) s. This section explains the design of
324
- the expression graph datastructure in [ ` Nonlinear.Expression ` ] ( @ref ) .
394
+ the expression graph data structure in [ ` Nonlinear.Expression ` ] ( @ref ) .
325
395
326
396
Given a nonlinear function like ` f(x) = sin(x)^2 + x ` , a conceptual aid for
327
397
thinking about the graph representation of the expression is to convert it into
@@ -344,7 +414,7 @@ julia> struct ExprNode
344
414
julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);
345
415
```
346
416
347
- This datastructure follows our Polish prefix notation very closely, and we can
417
+ This data structure follows our Polish prefix notation very closely, and we can
348
418
easily identify the arguments to an operator. However, it has a significant
349
419
draw-back: each node in the graph requires a ` Vector ` , which is heap-allocated
350
420
and tracked by Julia's garbage collector (GC). For large models, we can expect
@@ -380,7 +450,7 @@ challenges:
380
450
381
451
### Sketch of the design in Nonlinear
382
452
383
- ` Nonlinear ` overcomes these problems by decomposing the datastructure into a
453
+ ` Nonlinear ` overcomes these problems by decomposing the data structure into a
384
454
number of different concrete-typed vectors.
385
455
386
456
First, we create vectors of the supported uni- and multivariate operators.
@@ -443,7 +513,7 @@ julia> expr = Expression(
443
513
);
444
514
```
445
515
446
- This is less readable than the other options, but does this datastructure meet
516
+ This is less readable than the other options, but does this data structure meet
447
517
our design goals?
448
518
449
519
Instead of a heap-allocated object for each node, we only have two ` Vector ` s for
@@ -477,3 +547,65 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
477
547
[ ` Nonlinear.Model ` ] ( @ref ) is a struct that stores the
478
548
[ ` Nonlinear.OperatorRegistry ` ] ( @ref ) , as well as a list of parameters and
479
549
subexpressions in the model.
550
+
551
+ ## ReverseAD
552
+
553
+ ` Nonlinear.ReverseAD ` is a submodule for computing derivatives of a nonlinear
554
+ optimization problem using sparse reverse-mode automatic differentiation (AD).
555
+
556
+ This section does not attempt to explain how sparse reverse-mode AD works, but
557
+ instead explains why MOI contains its own implementation, and highlights
558
+ notable differences from similar packages.
559
+
560
+ !!! warning
561
+ Don't use the API in ` ReverseAD ` to compute derivatives. Instead, create a
562
+ [ ` Nonlinear.Evaluator ` ] ( @ref ) object with [ ` Nonlinear.SparseReverseMode ` ] ( @ref )
563
+ as the backend, and then query the MOI API methods.
564
+
565
+ ### Design goals
566
+
567
+ The JuliaDiff organization maintains a [ list of packages] ( https://juliadiff.org )
568
+ for doing AD in Julia. At last count, there were at least ten packages–-not
569
+ including ` ReverseAD ` -–for reverse-mode AD in Julia. ` ReverseAD ` exists because
570
+ it has a different set of design goals.
571
+
572
+ * ** Goal: handle scale and sparsity**
573
+ The types of nonlinear optimization problems that MOI represents can be large
574
+ scale (10^5 or more functions across 10^5 or more variables) with very sparse
575
+ derivatives. The ability to compute a sparse Hessian matrix is essential. To
576
+ the best of our knowledge, ` ReverseAD ` is the only reverse-mode AD system in
577
+ Julia that handles sparsity by default.
578
+ * ** Goal: limit the scope to improve robustness**
579
+ Most other AD packages accept arbitrary Julia functions as input and then
580
+ trace an expression graph using operator overloading. This means they must
581
+ deal (or detect and ignore) with control flow, I/O, and other vagaries of
582
+ Julia. In contrast, ` ReverseAD ` only accepts functions in the form of
583
+ [ ` Nonlinear.Expression ` ] ( @ref ) , which greatly limits the range of syntax that
584
+ it must deal with. By reducing the scope of what we accept as input to
585
+ functions relevant for mathematical optimization, we can provide a simpler
586
+ implementation with various performance optimizations.
587
+ * ** Goal: provide outputs which match what solvers expect**
588
+ Other AD packages focus on differentiating individual Julia functions. In
589
+ constrast, ` ReverseAD ` has a very specific use-case: to generate outputs
590
+ needed by the MOI nonlinear API. This means it needs to efficiently compute
591
+ sparse Hessians, and it needs subexpression handling to avoid recomputing
592
+ subexpressions that are shared between functions.
593
+
594
+ ### History
595
+
596
+ ` ReverseAD ` started life as [ ReverseDiffSparse.jl] ( https://github.com/mlubin/ReverseDiffSparse.jl ) ,
597
+ development of which begain in early 2014(!). This was well before the other
598
+ packages started development. Because we had a well-tested, working AD in JuMP,
599
+ there was less motivation to contribute to and explore other AD packages. The
600
+ lack of historical interaction also meant that other packages were not optimized
601
+ for the types of problems that JuMP is built for (i.e., large-scale sparse
602
+ problems). When we first created MathOptInterface, we kept the AD in JuMP to
603
+ simplify the transition, and post-poned the development of a first-class
604
+ nonlinear interface in MathOptInterface.
605
+
606
+ Prior to the introduction of ` Nonlinear ` , JuMP's nonlinear implementation was a
607
+ confusing mix of functions and types spread across the code base and in the
608
+ private ` _Derivatives ` submodule. This made it hard to swap the AD system for
609
+ another. The main motivation for refactoring JuMP to create the ` Nonlinear `
610
+ submodule in MathOptInterface was to abstract the interface between JuMP and the
611
+ AD system, allowing us to swap-in and test new AD systems in the future.
0 commit comments