jump-dev
diff --git a/‎docs/src/submodules/Nonlinear/overview.md
Lines changed: 105 additions & 0 deletions b/‎docs/src/submodules/Nonlinear/overview.md
Lines changed: 105 additions & 0 deletions
diff --git a/‎docs/src/submodules/Nonlinear/reference.md
Lines changed: 1 addition & 0 deletions b/‎docs/src/submodules/Nonlinear/reference.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/Nonlinear/Nonlinear.jl
Lines changed: 2 additions & 0 deletions b/‎src/Nonlinear/Nonlinear.jl
Lines changed: 2 additions & 0 deletions
@@ -208,6 +208,7 @@ before we can call [`initialize`](@ref), we need to set an
 There are two to choose from within MOI, although other packages may add more
 options by sub-typing [`Nonlinear.AbstractAutomaticDifferentiation`](@ref):
  * [`Nonlinear.ExprGraphOnly`](@ref)
+ * [`Nonlinear.SparseReverseMode`](@ref).
 
 If we set [`Nonlinear.ExprGraphOnly`](@ref), then we get access to `:ExprGraph`:
 ```jldoctest nonlinear_developer
@@ -229,6 +230,52 @@ However, we cannot call gradient terms such as
 [`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does
 not know how to differentiate a nonlinear expression.
 
+If, instead, we set [`Nonlinear.SparseReverseMode`](@ref), then we get access to
+`:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix of
+the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and
+`:ExprGraph`.
+```jldoctest nonlinear_developer
+julia> Nonlinear.set_differentiation_backend(
+           data,
+           Nonlinear.SparseReverseMode(),
+           [x],
+       )
+
+julia> data
+NonlinearData with available features:
+  * :Grad
+  * :Jac
+  * :JacVec
+  * :ExprGraph
+```
+
+However, before calling anything, we need to call [`initialize`](@ref):
+```jldoctest nonlinear_developer
+julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph])
+```
+
+Now we can call methods like [`eval_objective`](@ref):
+```jldoctest nonlinear_developer
+julia> x = [1.0]
+1-element Vector{Float64}:
+ 1.0
+
+julia> MOI.eval_objective(data, x)
+7.268073418273571
+```
+and [`eval_objective_gradient`](@ref):
+```jldoctest nonlinear_developer
+julia> grad = [NaN]
+1-element Vector{Float64}:
+ NaN
+
+julia> MOI.eval_objective_gradient(data, grad, x)
+
+julia> grad
+1-element Vector{Float64}:
+ 1.909297426825682
+```
+
 ## Expression-graph representation
 
 [`Nonlinear.NonlinearData`](@ref) stores nonlinear expressions in
@@ -388,3 +435,61 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
 [`Nonlinear.NonlinearData`](@ref) is a struct that stores the
 [`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and
 subexpressions in the model.
+
+## ReverseAD
+
+`Nonlinear.ReverseAD` is a submodule for computing derivatives of the problem
+inside [`Nonlinear.NonlinearData`](@ref) using sparse reverse-mode automatic
+differentiation (AD).
+
+This section does not attempt to explain how sparse reverse-mode AD works, but
+instead explains why MOI contains it's own implementation, and highlights
+notable differences from similar packages.
+
+!!! warning
+    You should not interact with `ReverseAD` directly. Instead, you should
+    create a [`Nonlinear.NonlinearData`](@ref) object, call
+    [`Nonlinear.set_differentiation_backend`](@ref) with
+    [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods.
+
+### Why another AD package?
+
+The JuliaDiff organization maintains a [list of packages](https://juliadiff.org)
+for doing AD in Julia. At last count, there were at least ten packages–not
+including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why
+does MOI maintain another implementation instead of re-using existing tooling?
+
+Here are four reasons:
+
+ * **Scale and Sparsity:** the types of functions that MOI computes derivatives
+   of have two key characteristics: they can be very large scale (10^5 or more
+   functions across 10^5 or more variables) and they are very sparse. For large
+   problems, it is common for the hessian to have `O(n)` non-zero elements
+   instead of `O(n^2)` if it was dense. To the best of our knowledge,
+   `ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity
+   by default. The lack of sparsity support is _the_ main reason why we do no
+   use a generic package.
+ * **Limited scope:** most other AD packages accept arbitrary Julia functions as
+   input and then trace an expression graph using operator overloading. This
+   means they must deal (or detect and ignore) with control flow, I/O, and other
+   vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the
+   form of [`Nonlinear.NonlinearExpression`](@ref), which greatly limits the
+   range of syntax that it must deal with. By reducing the scope of what we
+   accept as input to functions relevant for mathematical optimization, we can
+   provide a simpler implementation with various performance optimizations.
+ * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
+   development of which begain in early 2014(!). This was well before the other
+   packages started development. Because we had a well-tested, working AD in
+   JuMP, there was less motivation to contribute to and explore other AD
+   packages. The lack of historical interaction also meant that other packages
+   were not optimized for the types of problems that JuMP is built for (i.e.,
+   large-scale sparse problems). When we first created MathOptInterface, we kept
+   the AD in JuMP to simplify the transition, and post-poned the development of
+   a first-class nonlinear interface in MathOptInterface.
+ * **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear
+   implementation was a confusing mix of functions and types spread across the
+   code base and in the private `_Derivatives` submodule. This made it hard to
+   swap the AD system for another. The main motivation for refactoring JuMP to
+   create the `Nonlinear` submodule in MathOptInterface was to abstract the
+   interface between JuMP and the AD system, allowing us to swap-in and test new
+   AD systems in the future.
@@ -68,6 +68,7 @@ Nonlinear.eval_comparison_function
 ```@docs
 Nonlinear.AbstractAutomaticDifferentiation
 Nonlinear.ExprGraphOnly
+Nonlinear.SparseReverseMode
 Nonlinear.set_differentiation_backend
 ```
 
 
@@ -34,6 +34,8 @@ include("operators.jl")
 include("types.jl")
 include("parse.jl")
 
+include("ReverseAD/ReverseAD.jl")
+
 """
     set_objective(data::NonlinearData, obj)::Nothing