From 907080abf7454d6908f8c98d0f70abd1d4a33a86 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Tue, 19 Apr 2022 06:20:05 +1200
Subject: [PATCH 01/16] [Nonlinear] add the Nonlinear.ReverseAD submodule

The majority of this development was carried out in the JuMP PRs:
 * https://github.com/jump-dev/JuMP.jl/pull/2939
 * https://github.com/jump-dev/JuMP.jl/pull/2942
 * https://github.com/jump-dev/JuMP.jl/pull/2943

Nonlinear.ReverseAD is a minor refactoring of code that previously
existed in JuMP under the _Derivatives submodule, and prior to that
the ReverseDiffSparse.jl package.
---
 docs/src/submodules/Nonlinear/overview.md     | 105 +++
 docs/src/submodules/Nonlinear/reference.md    |   1 +
 src/Nonlinear/Nonlinear.jl                    |   2 +
 src/Nonlinear/ReverseAD/Coloring/Coloring.jl  | 548 ++++++++++++
 .../ReverseAD/Coloring/topological_sort.jl    | 106 +++
 src/Nonlinear/ReverseAD/ReverseAD.jl          |  40 +
 .../ReverseAD/forward_over_reverse.jl         | 396 +++++++++
 src/Nonlinear/ReverseAD/graph_tools.jl        | 486 +++++++++++
 .../ReverseAD/mathoptinterface_api.jl         | 355 ++++++++
 src/Nonlinear/ReverseAD/precompile.jl         |  46 +
 src/Nonlinear/ReverseAD/reverse_mode.jl       | 259 ++++++
 src/Nonlinear/ReverseAD/types.jl              | 188 ++++
 src/Nonlinear/ReverseAD/utils.jl              |  63 ++
 src/Nonlinear/types.jl                        |  18 +
 test/Nonlinear/Nonlinear.jl                   |   2 +
 test/Nonlinear/ReverseAD.jl                   | 803 ++++++++++++++++++
 16 files changed, 3418 insertions(+)
 create mode 100644 src/Nonlinear/ReverseAD/Coloring/Coloring.jl
 create mode 100644 src/Nonlinear/ReverseAD/Coloring/topological_sort.jl
 create mode 100644 src/Nonlinear/ReverseAD/ReverseAD.jl
 create mode 100644 src/Nonlinear/ReverseAD/forward_over_reverse.jl
 create mode 100644 src/Nonlinear/ReverseAD/graph_tools.jl
 create mode 100644 src/Nonlinear/ReverseAD/mathoptinterface_api.jl
 create mode 100644 src/Nonlinear/ReverseAD/precompile.jl
 create mode 100644 src/Nonlinear/ReverseAD/reverse_mode.jl
 create mode 100644 src/Nonlinear/ReverseAD/types.jl
 create mode 100644 src/Nonlinear/ReverseAD/utils.jl
 create mode 100644 test/Nonlinear/ReverseAD.jl

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index d0bd4463cb..51e2fa2428 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other
 packages may add more options by sub-typing
 [`Nonlinear.AbstractAutomaticDifferentiation`](@ref):
  * [`Nonlinear.ExprGraphOnly`](@ref)
+ * [`Nonlinear.SparseReverseMode`](@ref).
 
 ```jldoctest nonlinear_developer
 julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
@@ -317,6 +318,52 @@ julia> MOI.set(model, MOI.NLPBlock(), block);
     Only call [`NLPBlockData`](@ref) once you have finished modifying the
     problem in `model`.
 
+If, instead, we set [`Nonlinear.SparseReverseMode`](@ref), then we get access to
+`:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix of
+the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and
+`:ExprGraph`.
+```jldoctest nonlinear_developer
+julia> Nonlinear.set_differentiation_backend(
+           data,
+           Nonlinear.SparseReverseMode(),
+           [x],
+       )
+
+julia> data
+NonlinearData with available features:
+  * :Grad
+  * :Jac
+  * :JacVec
+  * :ExprGraph
+```
+
+However, before calling anything, we need to call [`initialize`](@ref):
+```jldoctest nonlinear_developer
+julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph])
+```
+
+Now we can call methods like [`eval_objective`](@ref):
+```jldoctest nonlinear_developer
+julia> x = [1.0]
+1-element Vector{Float64}:
+ 1.0
+
+julia> MOI.eval_objective(data, x)
+7.268073418273571
+```
+and [`eval_objective_gradient`](@ref):
+```jldoctest nonlinear_developer
+julia> grad = [NaN]
+1-element Vector{Float64}:
+ NaN
+
+julia> MOI.eval_objective_gradient(data, grad, x)
+
+julia> grad
+1-element Vector{Float64}:
+ 1.909297426825682
+```
+
 ## Expression-graph representation
 
 [`Nonlinear.Model`](@ref) stores nonlinear expressions in
@@ -477,3 +524,61 @@ user-defined functions using [`Nonlinear.register_operator`](@ref).
 [`Nonlinear.Model`](@ref) is a struct that stores the
 [`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and
 subexpressions in the model.
+
+## ReverseAD
+
+`Nonlinear.ReverseAD` is a submodule for computing derivatives of the problem
+inside [`Nonlinear.NonlinearData`](@ref) using sparse reverse-mode automatic
+differentiation (AD).
+
+This section does not attempt to explain how sparse reverse-mode AD works, but
+instead explains why MOI contains it's own implementation, and highlights
+notable differences from similar packages.
+
+!!! warning
+    You should not interact with `ReverseAD` directly. Instead, you should
+    create a [`Nonlinear.NonlinearData`](@ref) object, call
+    [`Nonlinear.set_differentiation_backend`](@ref) with
+    [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods.
+
+### Why another AD package?
+
+The JuliaDiff organization maintains a [list of packages](https://juliadiff.org)
+for doing AD in Julia. At last count, there were at least ten packages–not
+including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why
+does MOI maintain another implementation instead of re-using existing tooling?
+
+Here are four reasons:
+
+ * **Scale and Sparsity:** the types of functions that MOI computes derivatives
+   of have two key characteristics: they can be very large scale (10^5 or more
+   functions across 10^5 or more variables) and they are very sparse. For large
+   problems, it is common for the hessian to have `O(n)` non-zero elements
+   instead of `O(n^2)` if it was dense. To the best of our knowledge,
+   `ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity
+   by default. The lack of sparsity support is _the_ main reason why we do no
+   use a generic package.
+ * **Limited scope:** most other AD packages accept arbitrary Julia functions as
+   input and then trace an expression graph using operator overloading. This
+   means they must deal (or detect and ignore) with control flow, I/O, and other
+   vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the
+   form of [`Nonlinear.NonlinearExpression`](@ref), which greatly limits the
+   range of syntax that it must deal with. By reducing the scope of what we
+   accept as input to functions relevant for mathematical optimization, we can
+   provide a simpler implementation with various performance optimizations.
+ * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
+   development of which begain in early 2014(!). This was well before the other
+   packages started development. Because we had a well-tested, working AD in
+   JuMP, there was less motivation to contribute to and explore other AD
+   packages. The lack of historical interaction also meant that other packages
+   were not optimized for the types of problems that JuMP is built for (i.e.,
+   large-scale sparse problems). When we first created MathOptInterface, we kept
+   the AD in JuMP to simplify the transition, and post-poned the development of
+   a first-class nonlinear interface in MathOptInterface.
+ * **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear
+   implementation was a confusing mix of functions and types spread across the
+   code base and in the private `_Derivatives` submodule. This made it hard to
+   swap the AD system for another. The main motivation for refactoring JuMP to
+   create the `Nonlinear` submodule in MathOptInterface was to abstract the
+   interface between JuMP and the AD system, allowing us to swap-in and test new
+   AD systems in the future.
diff --git a/docs/src/submodules/Nonlinear/reference.md b/docs/src/submodules/Nonlinear/reference.md
index f1d0124624..f73c3f773a 100644
--- a/docs/src/submodules/Nonlinear/reference.md
+++ b/docs/src/submodules/Nonlinear/reference.md
@@ -70,6 +70,7 @@ Nonlinear.eval_comparison_function
 Nonlinear.Evaluator
 Nonlinear.AbstractAutomaticDifferentiation
 Nonlinear.ExprGraphOnly
+Nonlinear.SparseReverseMode
 ```
 
 ## Data-structure
diff --git a/src/Nonlinear/Nonlinear.jl b/src/Nonlinear/Nonlinear.jl
index b45f4523f5..cb54617d8b 100644
--- a/src/Nonlinear/Nonlinear.jl
+++ b/src/Nonlinear/Nonlinear.jl
@@ -50,4 +50,6 @@ include("parse.jl")
 include("model.jl")
 include("evaluator.jl")
 
+include("ReverseAD/ReverseAD.jl")
+
 end  # module
diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
new file mode 100644
index 0000000000..6a2b5f3eb5
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
@@ -0,0 +1,548 @@
+module Coloring
+
+import DataStructures
+
+include("topological_sort.jl")
+
+# indexed sparse set of integers
+mutable struct IndexedSet
+    nzidx::Vector{Int}
+    empty::BitVector
+    nnz::Int
+end
+
+IndexedSet(n::Integer) = IndexedSet(zeros(Int, n), trues(n), 0)
+
+function Base.push!(v::IndexedSet, i::Integer)
+    if v.empty[i]  # new index
+        v.nzidx[v.nnz+=1] = i
+        v.empty[i] = false
+    end
+    return
+end
+
+function Base.empty!(v::IndexedSet)
+    nzidx = v.nzidx
+    empty = v.empty
+    for i in 1:v.nnz
+        empty[nzidx[i]] = true
+    end
+    v.nnz = 0
+    return v
+end
+
+Base.length(v::IndexedSet) = length(v.nzidx)
+
+function Base.resize!(v::IndexedSet, n::Integer)
+    if n > length(v)
+        @assert v.nnz == 0 # only resize empty vector
+        resize!(v.nzidx, n)
+        resize!(v.empty, n)
+        fill!(v.empty, true)
+    end
+    return
+end
+
+Base.collect(v::IndexedSet) = v.nzidx[1:v.nnz]
+
+function Base.union!(v::IndexedSet, s)
+    for x in s
+        push!(v, x)
+    end
+    return
+end
+
+# compact storage for an undirected graph
+# neighbors of vertex i start at adjlist[offsets[i]]
+struct UndirectedGraph
+    adjlist::Vector{Int}
+    edgeindex::Vector{Int} # corresponding edge number, indexed as adjlist
+    offsets::Vector{Int}
+    edges::Vector{Tuple{Int,Int}}
+end
+
+_num_vertices(g::UndirectedGraph) = length(g.offsets) - 1
+
+_num_edges(g::UndirectedGraph) = length(g.edges)
+
+_num_neighbors(i::Int, g::UndirectedGraph) = g.offsets[i+1] - g.offsets[i]
+
+_start_neighbors(i::Int, g::UndirectedGraph) = g.offsets[i]
+
+function UndirectedGraph(I, J, nel)
+    adjcount = zeros(Int, nel)
+    n_edges = 0
+    for k in 1:length(I)
+        i = I[k]
+        j = J[k]
+        if i == j
+            continue
+        end
+        n_edges += 1
+        adjcount[i] += 1
+        adjcount[j] += 1
+    end
+    offsets = Vector{Int}(undef, nel + 1)
+    offsets[1] = 1
+    for k in 1:nel
+        offsets[k+1] = offsets[k] + adjcount[k]
+    end
+    fill!(adjcount, 0)
+    edges = Array{Tuple{Int,Int}}(undef, n_edges)
+    adjlist = Vector{Int}(undef, offsets[nel+1] - 1)
+    edgeindex = Vector{Int}(undef, length(adjlist))
+    edge_count = 0
+    for k in 1:length(I)
+        i = I[k]
+        j = J[k]
+        if i == j
+            continue
+        end
+        edge_count += 1
+        adjlist[offsets[i]+adjcount[i]] = j
+        edgeindex[offsets[i]+adjcount[i]] = edge_count
+        adjcount[i] += 1
+        adjlist[offsets[j]+adjcount[j]] = i
+        edgeindex[offsets[j]+adjcount[j]] = edge_count
+        adjcount[j] += 1
+        edges[edge_count] = (i, j)
+    end
+    @assert edge_count == n_edges
+    return UndirectedGraph(adjlist, edgeindex, offsets, edges)
+end
+
+struct _Edge
+    index::Int
+    source::Int
+    target::Int
+end
+
+function _prevent_cycle(
+    v,
+    w,
+    x,
+    e_idx1,
+    e_idx2,
+    S,
+    firstVisitToTree,
+    forbiddenColors,
+    color,
+)
+    er = DataStructures.find_root!(S, e_idx2)
+    @inbounds first = firstVisitToTree[er]
+    p = first.source # but this depends on the order?
+    q = first.target
+    @inbounds if p != v
+        firstVisitToTree[er] = _Edge(e_idx1, v, w)
+    elseif q != w
+        forbiddenColors[color[x]] = v
+    end
+    return
+end
+
+function _grow_star(v, w, e_idx, firstNeighbor, color, S)
+    @inbounds e = firstNeighbor[color[w]]
+    p = e.source
+    @inbounds if p != v
+        firstNeighbor[color[w]] = _Edge(e_idx, v, w)
+    else
+        union!(S, e_idx, e.index)
+    end
+    return
+end
+
+function _merge_trees(eg, eg1, S)
+    e1 = DataStructures.find_root!(S, eg)
+    e2 = DataStructures.find_root!(S, eg1)
+    if e1 != e2
+        union!(S, eg, eg1)
+    end
+    return
+end
+
+"""
+    acyclic_coloring(g::UndirectedGraph)
+
+Implement the acyclic coloring algorithm of Gebremdehin, Tarafdar, Manne, and
+Pothen, "New Acyclic and Star Coloring Algorithms with Application to Computing
+Hessians." SIAM J. Sci. Comput. 2007.
+
+Returns `Tuple{Vector{Int},Int}` giving the color index of each node in `g`, as
+well as the total number of colors used.
+"""
+function acyclic_coloring(g::UndirectedGraph)
+    if _num_edges(g) == 0
+        return fill(1, _num_vertices(g)), 1
+    end
+    num_colors = 0
+    forbiddenColors = Int[]
+    firstNeighbor = _Edge[]
+    firstVisitToTree = fill(_Edge(0, 0, 0), _num_edges(g))
+    color = fill(0, _num_vertices(g))
+    # disjoint set forest of edges in the graph
+    S = DataStructures.IntDisjointSets(_num_edges(g))
+    @inbounds for v in 1:_num_vertices(g)
+        n_neighbor = _num_neighbors(v, g)
+        start_neighbor = _start_neighbors(v, g)
+        for k in 0:(n_neighbor-1)
+            w = g.adjlist[start_neighbor+k]
+            if color[w] == 0
+                continue
+            end
+            forbiddenColors[color[w]] = v
+        end
+        for k in 0:(n_neighbor-1)
+            w = g.adjlist[start_neighbor+k]
+            e_idx = g.edgeindex[start_neighbor+k]
+            if color[w] == 0
+                continue
+            end
+            n_neighbor_w = _num_neighbors(w, g)
+            start_neighbor_w = _start_neighbors(w, g)
+            for k2 in 0:(n_neighbor_w-1)
+                x = g.adjlist[start_neighbor_w+k2]
+                e2_idx = g.edgeindex[start_neighbor_w+k2]
+                if color[x] == 0
+                    continue
+                end
+                if forbiddenColors[color[x]] != v
+                    _prevent_cycle(
+                        v,
+                        w,
+                        x,
+                        e_idx,
+                        e2_idx,
+                        S,
+                        firstVisitToTree,
+                        forbiddenColors,
+                        color,
+                    )
+                end
+            end
+        end
+        # find feasible color
+        found = false
+        for k in 1:num_colors
+            if forbiddenColors[k] != v
+                color[v] = k
+                found = true
+                break
+            end
+        end
+        if !found
+            num_colors += 1
+            push!(forbiddenColors, 0)
+            push!(firstNeighbor, _Edge(0, 0, 0))
+            color[v] = num_colors
+        end
+        for k in 0:(n_neighbor-1)
+            w = g.adjlist[start_neighbor+k]
+            e_idx = g.edgeindex[start_neighbor+k]
+            if color[w] == 0
+                continue
+            end
+            _grow_star(v, w, e_idx, firstNeighbor, color, S)
+        end
+        for k in 0:(n_neighbor-1)
+            w = g.adjlist[start_neighbor+k]
+            e_idx = g.edgeindex[start_neighbor+k]
+            if color[w] == 0
+                continue
+            end
+            n_neighbor_w = _num_neighbors(w, g)
+            start_neighbor_w = _start_neighbors(w, g)
+            for k2 in 0:(n_neighbor_w-1)
+                x = g.adjlist[start_neighbor_w+k2]
+                e2_idx = g.edgeindex[start_neighbor_w+k2]
+                if color[x] == 0 || x == v
+                    continue
+                end
+                if color[x] == color[v]
+                    _merge_trees(e_idx, e2_idx, S)
+                end
+            end
+        end
+    end
+    return color, num_colors
+end
+
+struct RecoveryInfo
+    vertexmap::Vector{Vector{Int}}
+    postorder::Vector{Vector{Int}}
+    parents::Vector{Vector{Int}}
+    color::Vector{Int}
+    num_colors::Int
+    nnz::Int # number of off-diagonal nonzeros
+    local_indices::Vector{Int} # map back to global indices
+end
+
+function RecoveryInfo()
+    return RecoveryInfo(
+        Vector{Vector{Int}}(undef, 0),
+        Vector{Vector{Int}}(undef, 0),
+        Vector{Vector{Int}}(undef, 0),
+        Vector{Int}(undef, 0),
+        0,
+        0,
+        Vector{Int}(undef, 0),
+    )
+end
+
+function recovery_preprocess(
+    g::UndirectedGraph,
+    color,
+    num_colors,
+    local_indices,
+)
+    # represent two-color subgraph as:
+    # list of vertices (with map to global indices)
+    # adjacency list in a single vector (with list of offsets)
+    # linear index of pair of colors
+    twocolorindex = zeros(Int32, num_colors, num_colors)
+    seen_twocolors = 0
+    # count of edges in each subgraph
+    edge_count = Int[]
+    for k in 1:length(g.edges)
+        u, v = g.edges[k]
+        i = min(color[u], color[v])
+        j = max(color[u], color[v])
+        if twocolorindex[i, j] == 0
+            seen_twocolors += 1
+            twocolorindex[i, j] = seen_twocolors
+            push!(edge_count, 0)
+        end
+        idx = twocolorindex[i, j]
+        edge_count[idx] += 1
+    end
+    # edges sorted by twocolor subgraph
+    sorted_edges = Array{Vector{Tuple{Int,Int}}}(undef, seen_twocolors)
+    for idx in 1:seen_twocolors
+        sorted_edges[idx] = Tuple{Int,Int}[]
+        sizehint!(sorted_edges[idx], edge_count[idx])
+    end
+    for i in 1:length(g.edges)
+        u, v = g.edges[i]
+        i = min(color[u], color[v])
+        j = max(color[u], color[v])
+        idx = twocolorindex[i, j]
+        push!(sorted_edges[idx], (u, v))
+    end
+    # list of unique vertices in each twocolor subgraph
+    vertexmap = Array{Vector{Int}}(undef, seen_twocolors)
+    postorder = Array{Vector{Int}}(undef, seen_twocolors)
+    parents = Array{Vector{Int}}(undef, seen_twocolors)
+    # temporary lookup map from global index to subgraph index
+    revmap = zeros(Int, _num_vertices(g))
+    adjcount = zeros(Int, _num_vertices(g))
+    cmap = zeros(Int, 0) # shared storage for DFS
+    for idx in 1:seen_twocolors
+        my_edges = sorted_edges[idx]
+        vlist = Int[]
+        # build up the vertex list and adjacency count
+        for k in 1:length(my_edges)
+            u, v = my_edges[k]
+            # seen these vertices yet?
+            if revmap[u] == 0
+                push!(vlist, u)
+                revmap[u] = length(vlist)
+                adjcount[u] = 0
+            end
+            if revmap[v] == 0
+                push!(vlist, v)
+                revmap[v] = length(vlist)
+                adjcount[v] = 0
+            end
+            adjcount[u] += 1
+            adjcount[v] += 1
+        end
+        # set up offsets for adjlist
+        offset = Vector{Int}(undef, length(vlist) + 1)
+        offset[1] = 1
+        for k in 1:length(vlist)
+            offset[k+1] = offset[k] + adjcount[vlist[k]]
+            adjcount[vlist[k]] = 0
+        end
+        # adjlist for node u in twocolor idx starts at
+        # vec[offset[u]]
+        # u has global index vlist[u]
+        vec = Vector{Int}(undef, offset[length(vlist)+1] - 1)
+        # now fill in
+        for k in 1:length(my_edges)
+            u, v = my_edges[k]
+            u_rev = revmap[u] # indices in the subgraph
+            v_rev = revmap[v]
+            vec[offset[u_rev]+adjcount[u]] = v_rev
+            vec[offset[v_rev]+adjcount[v]] = u_rev
+            adjcount[u] += 1
+            adjcount[v] += 1
+        end
+        resize!(cmap, length(vlist))
+        order, parent =
+            reverse_topological_sort_by_dfs(vec, offset, length(vlist), cmap)
+        for k in 1:length(vlist)
+            revmap[vlist[k]] = 0  # clear for reuse
+        end
+        postorder[idx] = order
+        parents[idx] = parent
+        vertexmap[idx] = vlist
+    end
+    return RecoveryInfo(
+        vertexmap,
+        postorder,
+        parents,
+        color,
+        num_colors,
+        _num_edges(g),
+        local_indices,
+    )
+end
+
+_normalize(i, j) = (j > i) ? (j, i) : (i, j)
+
+function _indirect_recover_structure(rinfo::RecoveryInfo)
+    N = length(rinfo.color)
+    I = zeros(Int, rinfo.nnz + N)
+    J = zeros(Int, rinfo.nnz + N)
+    k = 0
+    for i in 1:N
+        k += 1
+        I[k] = J[k] = i
+    end
+    for t in 1:length(rinfo.postorder)
+        vmap = rinfo.vertexmap[t]
+        order = rinfo.postorder[t]
+        parent = rinfo.parents[t]
+        for z in 1:length(order)
+            v = order[z]
+            p = parent[v]
+            if p == 0
+                continue
+            end
+            k += 1
+            I[k], J[k] = _normalize(vmap[v], vmap[p])
+        end
+    end
+    @assert k == rinfo.nnz + N
+    return I, J
+end
+
+"""
+    hessian_color_preprocess(
+        edgelist,
+        num_total_var,
+        seen_idx = IndexedSet(0),
+    )
+
+`edgelist` contains the nonzeros in the Hessian, *including* nonzeros on the
+diagonal.
+"""
+function hessian_color_preprocess(
+    edgelist,
+    num_total_var,
+    seen_idx = IndexedSet(0),
+)
+    resize!(seen_idx, num_total_var)
+    I, J = Int[], Int[]
+    for (i, j) in edgelist
+        push!(seen_idx, i)
+        push!(seen_idx, j)
+        push!(I, i)
+        push!(J, j)
+    end
+    local_indices = sort!(seen_idx.nzidx[1:seen_idx.nnz])
+    empty!(seen_idx)
+    global_to_local_idx = seen_idx.nzidx # steal for storage
+    for k in 1:length(local_indices)
+        global_to_local_idx[local_indices[k]] = k
+    end
+    # only do the coloring on the local indices
+    for k in 1:length(I)
+        I[k] = global_to_local_idx[I[k]]
+        J[k] = global_to_local_idx[J[k]]
+    end
+    g = UndirectedGraph(I, J, length(local_indices))
+    color, num_colors = acyclic_coloring(g)
+    @assert length(color) == _num_vertices(g)
+    rinfo = recovery_preprocess(g, color, num_colors, local_indices)
+    I, J = _indirect_recover_structure(rinfo)
+    # convert back to global indices
+    for k in 1:length(I)
+        I[k] = local_indices[I[k]]
+        J[k] = local_indices[J[k]]
+    end
+    return I, J, rinfo
+end
+
+"""
+    seed_matrix(rinfo::RecoveryInfo)
+
+Allocate a seed matrix for the Coloring.
+"""
+function seed_matrix(rinfo::RecoveryInfo)
+    return Array{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors)
+end
+
+"""
+    prepare_seed_matrix!(R, rinfo::RecoveryInfo)
+"""
+function prepare_seed_matrix!(R, rinfo::RecoveryInfo)
+    N = length(rinfo.color)
+    @assert N == size(R, 1) == length(rinfo.local_indices)
+    @assert size(R, 2) == rinfo.num_colors
+    fill!(R, 0.0)
+    for i in 1:N
+        R[i, rinfo.color[i]] = 1
+    end
+    return
+end
+
+"""
+    recover_from_matmat!(
+        V::AbstractVector{T},
+        R::AbstractMatrix{T},
+        rinfo::RecoveryInfo,
+        stored_values::AbstractVector{T},
+    ) where {T}
+
+recover the hessian values from the hessian-matrix solution
+stored_values is a temporary vector of length >= length(rinfo.local_indices)
+"""
+function recover_from_matmat!(
+    V::AbstractVector{T},
+    R::AbstractMatrix{T},
+    rinfo::RecoveryInfo,
+    stored_values::AbstractVector{T},
+) where {T}
+    N = length(rinfo.color) # number of local variables
+    @assert length(V) == rinfo.nnz + N
+    @assert length(stored_values) >= length(rinfo.local_indices)
+    k = 0
+    for i in 1:N
+        k += 1
+        V[k] = R[i, rinfo.color[i]]
+    end
+    for t in 1:length(rinfo.vertexmap)
+        vmap = rinfo.vertexmap[t]
+        order = rinfo.postorder[t]
+        parent = rinfo.parents[t]
+        for z in 1:length(order)
+            @inbounds stored_values[z] = 0.0
+        end
+        @inbounds for z in 1:length(order)
+            v = order[z]
+            p = parent[v]
+            if p == 0
+                continue
+            end
+            i, j = vmap[v], vmap[p]
+            value = R[i, rinfo.color[j]] - stored_values[v]
+            stored_values[p] += value
+            k += 1
+            V[k] = value
+        end
+    end
+    @assert k == rinfo.nnz + N
+    return
+end
+
+end  # module
diff --git a/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl b/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl
new file mode 100644
index 0000000000..0b8d0ca2eb
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl
@@ -0,0 +1,106 @@
+# Modified from Graphs.jl.
+# Workaround for edge_index not being O(1) on SimpleGraph.
+# edge_index was only needed to test for cycles, so
+# this implementation skips that check.
+
+# Graphs.jl is licensed under the MIT License:
+#
+# Copyright (c) 2012: John Myles White and other contributors.
+#
+# Permission is hereby granted, free of charge, to any person obtaining
+# a copy of this software and associated documentation files (the
+# "Software"), to deal in the Software without restriction, including
+# without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and to
+# permit persons to whom the Software is furnished to do so, subject to
+# the following conditions:
+#
+# The above copyright notice and this permission notice shall be
+# included in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+mutable struct _TopologicalSortVisitor
+    vertices::Vector{Int}
+    parents::Vector{Int}
+
+    function _TopologicalSortVisitor(n::Int)
+        vs = Int[]
+        sizehint!(vs, n)
+        return new(vs, zeros(Int, n))
+    end
+end
+
+function _traverse_graph(
+    adjlist,
+    offsets,
+    s,
+    visitor::_TopologicalSortVisitor,
+    vertexcolormap,
+    vertex_stack,
+    index_stack,
+)
+    vertexcolormap[s] = 1
+    resize!(vertex_stack, 1)
+    vertex_stack[1] = s
+    resize!(index_stack, 1)
+    index_stack[1] = 1
+    while !isempty(vertex_stack)
+        u = pop!(vertex_stack)
+        out_idx = pop!(index_stack)
+        len_uegs = offsets[u+1] - offsets[u]
+        found_new_vertex = false
+        while out_idx <= len_uegs && !found_new_vertex
+            v = adjlist[offsets[u]+out_idx-1]
+            out_idx += 1
+            v_color = vertexcolormap[v]
+            if v_color == 0
+                found_new_vertex = true
+                vertexcolormap[v] = 1
+                push!(vertex_stack, u)
+                push!(index_stack, out_idx)
+                visitor.parents[v] = u
+                push!(vertex_stack, v)
+                push!(index_stack, 1)
+            end
+        end
+        if !found_new_vertex
+            push!(visitor.vertices, u)
+            vertexcolormap[u] = 2
+        end
+    end
+    return
+end
+
+function reverse_topological_sort_by_dfs(
+    adjlist,
+    offsets,
+    num_vertices,
+    cmap::Vector{Int},
+    vertex_stack::Vector{Int} = Int[],
+    index_stack::Vector{Int} = Int[],
+)
+    @assert length(cmap) == num_vertices
+    fill!(cmap, 0)
+    visitor = _TopologicalSortVisitor(num_vertices)
+    for s in 1:num_vertices
+        if cmap[s] == 0
+            _traverse_graph(
+                adjlist,
+                offsets,
+                s,
+                visitor,
+                cmap,
+                vertex_stack,
+                index_stack,
+            )
+        end
+    end
+    return visitor.vertices, visitor.parents
+end
diff --git a/src/Nonlinear/ReverseAD/ReverseAD.jl b/src/Nonlinear/ReverseAD/ReverseAD.jl
new file mode 100644
index 0000000000..5d865466d0
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/ReverseAD.jl
@@ -0,0 +1,40 @@
+module ReverseAD
+
+import ForwardDiff
+import MathOptInterface
+import ..Nonlinear
+import SparseArrays
+
+const MOI = MathOptInterface
+
+# Override basic math functions to return NaN instead of throwing errors.
+# This is what NLP solvers expect, and sometimes the results aren't needed
+# anyway, because the code may compute derivatives wrt constants.
+import NaNMath:
+    sin,
+    cos,
+    tan,
+    asin,
+    acos,
+    acosh,
+    atanh,
+    log,
+    log2,
+    log10,
+    lgamma,
+    log1p,
+    pow,
+    sqrt
+
+include("Coloring/Coloring.jl")
+include("graph_tools.jl")
+include("types.jl")
+include("utils.jl")
+
+include("reverse_mode.jl")
+include("forward_over_reverse.jl")
+include("mathoptinterface_api.jl")
+
+include("precompile.jl")
+
+end  # module
diff --git a/src/Nonlinear/ReverseAD/forward_over_reverse.jl b/src/Nonlinear/ReverseAD/forward_over_reverse.jl
new file mode 100644
index 0000000000..a272cba32d
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/forward_over_reverse.jl
@@ -0,0 +1,396 @@
+const TAG = :ReverseAD
+
+"""
+    _eval_hessian(
+        d::NLPEvaluator,
+        f::_FunctionStorage,
+        H::AbstractVector{Float64},
+        λ::Float64,
+        offset::Int,
+    )::Int
+
+Evaluate the hessian matrix of the function `f` and store the result, scaled by
+`λ`, in `H`, beginning at element `offset+1`. This function assumes that
+`_reverse_mode(d, x)` has already been called.
+
+Returns the number of non-zeros in the computed Hessian, which will be used to
+update the offset for the next call.
+"""
+function _eval_hessian(
+    d::NLPEvaluator,
+    f::_FunctionStorage,
+    H::AbstractVector{Float64},
+    λ::Float64,
+    offset::Int,
+)::Int
+    chunk = min(size(f.seed_matrix, 2), d.max_chunk)
+    # As a performance optimization, skip dynamic dispatch if the chunk is 1.
+    if chunk == 1
+        return _eval_hessian_inner(d, f, H, λ, offset, Val(1))
+    else
+        return _eval_hessian_inner(d, f, H, λ, offset, Val(chunk))
+    end
+end
+
+function _eval_hessian_inner(
+    d::NLPEvaluator,
+    ex::_FunctionStorage,
+    H::AbstractVector{Float64},
+    scale::Float64,
+    nzcount::Int,
+    ::Val{CHUNK},
+) where {CHUNK}
+    if ex.linearity == LINEAR
+        @assert length(ex.hess_I) == 0
+        return 0
+    end
+    T = ForwardDiff.Partials{CHUNK,Float64}  # This is our element type.
+    Coloring.prepare_seed_matrix!(ex.seed_matrix, ex.rinfo)
+    local_to_global_idx = ex.rinfo.local_indices
+    input_ϵ_raw, output_ϵ_raw = d.input_ϵ, d.output_ϵ
+    input_ϵ = _reinterpret_unsafe(T, input_ϵ_raw)
+    output_ϵ = _reinterpret_unsafe(T, output_ϵ_raw)
+    # Compute hessian-vector products
+    num_products = size(ex.seed_matrix, 2) # number of hessian-vector products
+    num_chunks = div(num_products, CHUNK)
+    @assert size(ex.seed_matrix, 1) == length(local_to_global_idx)
+    for k in 1:CHUNK:CHUNK*num_chunks
+        for r in 1:length(local_to_global_idx)
+            # set up directional derivatives
+            @inbounds idx = local_to_global_idx[r]
+            # load up ex.seed_matrix[r,k,k+1,...,k+CHUNK-1] into input_ϵ
+            for s in 1:CHUNK
+                input_ϵ_raw[(idx-1)*CHUNK+s] = ex.seed_matrix[r, k+s-1]
+            end
+            @inbounds output_ϵ[idx] = zero(T)
+        end
+        _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, T)
+        # collect directional derivatives
+        for r in 1:length(local_to_global_idx)
+            idx = local_to_global_idx[r]
+            # load output_ϵ into ex.seed_matrix[r,k,k+1,...,k+CHUNK-1]
+            for s in 1:CHUNK
+                ex.seed_matrix[r, k+s-1] = output_ϵ_raw[(idx-1)*CHUNK+s]
+            end
+            @inbounds input_ϵ[idx] = zero(T)
+        end
+    end
+    # leftover chunk
+    remaining = num_products - CHUNK * num_chunks
+    if remaining > 0
+        k = CHUNK * num_chunks + 1
+        for r in 1:length(local_to_global_idx)
+            # set up directional derivatives
+            @inbounds idx = local_to_global_idx[r]
+            # load up ex.seed_matrix[r,k,k+1,...,k+remaining-1] into input_ϵ
+            for s in 1:remaining
+                # leave junk in the unused components
+                input_ϵ_raw[(idx-1)*CHUNK+s] = ex.seed_matrix[r, k+s-1]
+            end
+            @inbounds output_ϵ[idx] = zero(T)
+        end
+        _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, T)
+        # collect directional derivatives
+        for r in 1:length(local_to_global_idx)
+            idx = local_to_global_idx[r]
+            # load output_ϵ into ex.seed_matrix[r,k,k+1,...,k+remaining-1]
+            for s in 1:remaining
+                ex.seed_matrix[r, k+s-1] = output_ϵ_raw[(idx-1)*CHUNK+s]
+            end
+            @inbounds input_ϵ[idx] = zero(T)
+        end
+    end
+    # TODO(odow): consider reverting to a view.
+    output_slice = _UnsafeVectorView(nzcount, length(ex.hess_I), pointer(H))
+    Coloring.recover_from_matmat!(
+        output_slice,
+        ex.seed_matrix,
+        ex.rinfo,
+        d.output_ϵ,
+    )
+    for i in 1:length(output_slice)
+        output_slice[i] *= scale
+    end
+    return length(ex.hess_I)
+end
+
+function _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, ::Type{T}) where {T}
+    subexpr_forward_values_ϵ =
+        _reinterpret_unsafe(T, d.subexpression_forward_values_ϵ)
+    for i in ex.dependent_subexpressions
+        subexpr = d.subexpressions[i]
+        subexpr_forward_values_ϵ[i] = _forward_eval_ϵ(
+            subexpr,
+            _reinterpret_unsafe(T, subexpr.forward_storage_ϵ),
+            _reinterpret_unsafe(T, subexpr.partials_storage_ϵ),
+            input_ϵ,
+            subexpr_forward_values_ϵ,
+            d.data.operators,
+        )
+    end
+    _forward_eval_ϵ(
+        ex,
+        _reinterpret_unsafe(T, d.forward_storage_ϵ),
+        _reinterpret_unsafe(T, d.partials_storage_ϵ),
+        input_ϵ,
+        subexpr_forward_values_ϵ,
+        d.data.operators,
+    )
+    # do a reverse pass
+    subexpr_reverse_values_ϵ =
+        _reinterpret_unsafe(T, d.subexpression_reverse_values_ϵ)
+    for i in ex.dependent_subexpressions
+        subexpr_reverse_values_ϵ[i] = zero(T)
+        d.subexpression_reverse_values[i] = 0.0
+    end
+    _reverse_eval_ϵ(
+        output_ϵ,
+        ex,
+        _reinterpret_unsafe(T, d.reverse_storage_ϵ),
+        _reinterpret_unsafe(T, d.partials_storage_ϵ),
+        d.subexpression_reverse_values,
+        subexpr_reverse_values_ϵ,
+        1.0,
+        zero(T),
+    )
+    for i in length(ex.dependent_subexpressions):-1:1
+        j = ex.dependent_subexpressions[i]
+        subexpr = d.subexpressions[j]
+        _reverse_eval_ϵ(
+            output_ϵ,
+            subexpr,
+            _reinterpret_unsafe(T, subexpr.reverse_storage_ϵ),
+            _reinterpret_unsafe(T, subexpr.partials_storage_ϵ),
+            d.subexpression_reverse_values,
+            subexpr_reverse_values_ϵ,
+            d.subexpression_reverse_values[j],
+            subexpr_reverse_values_ϵ[j],
+        )
+    end
+    return
+end
+
+"""
+    _forward_eval_ϵ(
+        ex::Union{_FunctionStorage,_SubexpressionStorage},
+        storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}},
+        partials_storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}},
+        x_values_ϵ,
+        subexpression_values_ϵ,
+        user_operators::Nonlinear.OperatorRegistry,
+    ) where {N,T}
+
+Evaluate the directional derivatives of the expression tree in `ex`.
+
+This is equivalent to evaluating the expression tree using DualNumbers or
+ForwardDiff, but instead we keep the `ex.forward_storage` for the epsilon
+components separate so that we don't need to recompute the real components.
+
+This assumes that `_reverse_model(d, x)` has already been called.
+"""
+function _forward_eval_ϵ(
+    ex::Union{_FunctionStorage,_SubexpressionStorage},
+    storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}},
+    partials_storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}},
+    x_values_ϵ,
+    subexpression_values_ϵ,
+    user_operators::Nonlinear.OperatorRegistry,
+) where {N,T}
+    @assert length(storage_ϵ) >= length(ex.nodes)
+    @assert length(partials_storage_ϵ) >= length(ex.nodes)
+    zero_ϵ = zero(ForwardDiff.Partials{N,T})
+    # ex.nodes is already in order such that parents always appear before children
+    # so a backwards pass through ex.nodes is a forward pass through the tree
+    children_arr = SparseArrays.rowvals(ex.adj)
+    for k in length(ex.nodes):-1:1
+        node = ex.nodes[k]
+        partials_storage_ϵ[k] = zero_ϵ
+        if node.type == Nonlinear.NODE_VARIABLE
+            storage_ϵ[k] = x_values_ϵ[node.index]
+        elseif node.type == Nonlinear.NODE_VALUE
+            storage_ϵ[k] = zero_ϵ
+        elseif node.type == Nonlinear.NODE_SUBEXPRESSION
+            storage_ϵ[k] = subexpression_values_ϵ[node.index]
+        elseif node.type == Nonlinear.NODE_PARAMETER
+            storage_ϵ[k] = zero_ϵ
+        else
+            @assert node.type != Nonlinear.NODE_MOI_VARIABLE
+            ϵtmp = zero_ϵ
+            @inbounds children_idx = SparseArrays.nzrange(ex.adj, k)
+            for c_idx in children_idx
+                @inbounds ix = children_arr[c_idx]
+                @inbounds partial = ex.partials_storage[ix]
+                @inbounds storage_val = storage_ϵ[ix]
+                # TODO: This "if" statement can take 8% of the hessian
+                # evaluation time! Find a more efficient way.
+                if !isfinite(partial) && storage_val == zero_ϵ
+                    continue
+                end
+                ϵtmp += storage_val * ex.partials_storage[ix]
+            end
+            storage_ϵ[k] = ϵtmp
+            if node.type == Nonlinear.NODE_CALL_MULTIVARIATE
+                # TODO(odow): consider how to refactor this into Nonlinear.
+                op = node.index
+                n_children = length(children_idx)
+                if op == 3 # :*
+                    # Lazy approach for now.
+                    anyzero = false
+                    tmp_prod = one(ForwardDiff.Dual{TAG,T,N})
+                    for c_idx in children_idx
+                        ix = children_arr[c_idx]
+                        sval = ex.forward_storage[ix]
+                        gnum = ForwardDiff.Dual{TAG}(sval, storage_ϵ[ix])
+                        tmp_prod *= gnum
+                        anyzero = ifelse(sval * sval == zero(T), true, anyzero)
+                    end
+                    # By a quirk of floating-point numbers, we can have
+                    # anyzero == true && ForwardDiff.value(tmp_prod) != zero(T)
+                    if anyzero || n_children <= 2
+                        for c_idx in children_idx
+                            prod_others = one(ForwardDiff.Dual{TAG,T,N})
+                            for c_idx2 in children_idx
+                                (c_idx == c_idx2) && continue
+                                ix = children_arr[c_idx2]
+                                gnum = ForwardDiff.Dual{TAG}(
+                                    ex.forward_storage[ix],
+                                    storage_ϵ[ix],
+                                )
+                                prod_others *= gnum
+                            end
+                            partials_storage_ϵ[children_arr[c_idx]] =
+                                ForwardDiff.partials(prod_others)
+                        end
+                    else
+                        for c_idx in children_idx
+                            ix = children_arr[c_idx]
+                            prod_others =
+                                tmp_prod / ForwardDiff.Dual{TAG}(
+                                    ex.forward_storage[ix],
+                                    storage_ϵ[ix],
+                                )
+                            partials_storage_ϵ[ix] =
+                                ForwardDiff.partials(prod_others)
+                        end
+                    end
+                elseif op == 4 # :^
+                    @assert n_children == 2
+                    idx1 = first(children_idx)
+                    idx2 = last(children_idx)
+                    @inbounds ix1 = children_arr[idx1]
+                    @inbounds ix2 = children_arr[idx2]
+                    @inbounds base = ex.forward_storage[ix1]
+                    @inbounds base_ϵ = storage_ϵ[ix1]
+                    @inbounds exponent = ex.forward_storage[ix2]
+                    @inbounds exponent_ϵ = storage_ϵ[ix2]
+                    base_gnum = ForwardDiff.Dual{TAG}(base, base_ϵ)
+                    exponent_gnum = ForwardDiff.Dual{TAG}(exponent, exponent_ϵ)
+                    if exponent == 2
+                        partials_storage_ϵ[ix1] = 2 * base_ϵ
+                    elseif exponent == 1
+                        partials_storage_ϵ[ix1] = zero_ϵ
+                    else
+                        partials_storage_ϵ[ix1] = ForwardDiff.partials(
+                            exponent_gnum * base_gnum^(exponent_gnum - 1),
+                        )
+                    end
+                    result_gnum = ForwardDiff.Dual{TAG}(
+                        ex.forward_storage[k],
+                        storage_ϵ[k],
+                    )
+                    # TODO(odow): fix me to use NaNMath.jl instead
+                    log_base_gnum = base_gnum < 0 ? NaN : log(base_gnum)
+                    partials_storage_ϵ[ix2] =
+                        ForwardDiff.partials(result_gnum * log_base_gnum)
+                elseif op == 5 # :/
+                    @assert n_children == 2
+                    idx1 = first(children_idx)
+                    idx2 = last(children_idx)
+                    @inbounds ix1 = children_arr[idx1]
+                    @inbounds ix2 = children_arr[idx2]
+                    @inbounds numerator = ex.forward_storage[ix1]
+                    @inbounds numerator_ϵ = storage_ϵ[ix1]
+                    @inbounds denominator = ex.forward_storage[ix2]
+                    @inbounds denominator_ϵ = storage_ϵ[ix2]
+                    recip_denominator =
+                        1 / ForwardDiff.Dual{TAG}(denominator, denominator_ϵ)
+                    partials_storage_ϵ[ix1] =
+                        ForwardDiff.partials(recip_denominator)
+                    partials_storage_ϵ[ix2] = ForwardDiff.partials(
+                        -ForwardDiff.Dual{TAG}(numerator, numerator_ϵ) *
+                        recip_denominator *
+                        recip_denominator,
+                    )
+                elseif op > 6
+                    error(
+                        "User-defined operators not supported for hessian " *
+                        "computations",
+                    )
+                end
+            elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE
+                @inbounds child_idx = children_arr[ex.adj.colptr[k]]
+                f′′ = Nonlinear.eval_univariate_hessian(
+                    user_operators,
+                    user_operators.univariate_operators[node.index],
+                    ex.forward_storage[child_idx],
+                )
+                partials_storage_ϵ[child_idx] = f′′ * storage_ϵ[child_idx]
+            end
+        end
+    end
+    return storage_ϵ[1]
+end
+
+# Compute directional derivatives of the reverse pass, goes with _forward_eval_ϵ
+# to compute hessian-vector products.
+function _reverse_eval_ϵ(
+    output_ϵ::AbstractVector{ForwardDiff.Partials{N,T}},
+    ex::Union{_FunctionStorage,_SubexpressionStorage},
+    reverse_storage_ϵ,
+    partials_storage_ϵ,
+    subexpression_output,
+    subexpression_output_ϵ,
+    scale::T,
+    scale_ϵ::ForwardDiff.Partials{N,T},
+) where {N,T}
+    @assert length(reverse_storage_ϵ) >= length(ex.nodes)
+    @assert length(partials_storage_ϵ) >= length(ex.nodes)
+    if ex.nodes[1].type == Nonlinear.NODE_VARIABLE
+        @inbounds output_ϵ[ex.nodes[1].index] += scale_ϵ
+        return
+    elseif ex.nodes[1].type == Nonlinear.NODE_SUBEXPRESSION
+        @inbounds subexpression_output[ex.nodes[1].index] +=
+            scale * ex.reverse_storage[1]
+        @inbounds subexpression_output_ϵ[ex.nodes[1].index] += scale_ϵ
+        return
+    end
+    reverse_storage_ϵ[1] = scale_ϵ
+    for k in 2:length(ex.nodes)
+        @inbounds node = ex.nodes[k]
+        if node.type == Nonlinear.NODE_VALUE ||
+           node.type == Nonlinear.NODE_LOGIC ||
+           node.type == Nonlinear.NODE_COMPARISON ||
+           node.type == Nonlinear.NODE_PARAMETER
+            continue
+        end
+        parent_value = scale * ex.reverse_storage[node.parent]
+        if !isfinite(ex.partials_storage[k]) && iszero(parent_value)
+            reverse_storage_ϵ[k] = zero(ForwardDiff.Partials{N,T})
+        else
+            reverse_storage_ϵ[k] = ForwardDiff._mul_partials(
+                partials_storage_ϵ[k],
+                reverse_storage_ϵ[node.parent],
+                parent_value,
+                ex.partials_storage[k],
+            )
+        end
+        if node.type == Nonlinear.NODE_VARIABLE
+            @inbounds output_ϵ[node.index] += reverse_storage_ϵ[k]
+        elseif node.type == Nonlinear.NODE_SUBEXPRESSION
+            @inbounds subexpression_output[node.index] +=
+                scale * ex.reverse_storage[k]
+            @inbounds subexpression_output_ϵ[node.index] += reverse_storage_ϵ[k]
+        end
+    end
+    return
+end
diff --git a/src/Nonlinear/ReverseAD/graph_tools.jl b/src/Nonlinear/ReverseAD/graph_tools.jl
new file mode 100644
index 0000000000..b3c06f9ecf
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/graph_tools.jl
@@ -0,0 +1,486 @@
+"""
+    _replace_moi_variables(
+        nodes::Vector{Nonlinear.Node},
+        moi_index_to_consecutive_index::Dict{MOI.VariableIndex,Int},
+    )
+
+Return a new `Vector{Nonlinear.Node}` where all occurances of
+`NODE_MOI_VARIABLE` are replaced by `NODE_VARIABLE` that is 1-indexed and
+ordered.
+"""
+function _replace_moi_variables(
+    nodes::Vector{Nonlinear.Node},
+    moi_index_to_consecutive_index::Dict{MOI.VariableIndex,Int},
+)
+    new_nodes = Vector{Nonlinear.Node}(undef, length(nodes))
+    for (i, node) in enumerate(nodes)
+        if node.type == Nonlinear.NODE_MOI_VARIABLE
+            new_nodes[i] = Nonlinear.Node(
+                Nonlinear.NODE_VARIABLE,
+                moi_index_to_consecutive_index[MOI.VariableIndex(node.index)],
+                node.parent,
+            )
+        else
+            new_nodes[i] = node
+        end
+    end
+    return new_nodes
+end
+
+@enum(Linearity, CONSTANT, LINEAR, PIECEWISE_LINEAR, NONLINEAR)
+
+"""
+    _classify_linearity(
+        nodes::Vector{Nonlinear.Node},
+        adj::SparseArrays.SparseMatrixCSC,
+        subexpression_linearity::Vector{Linearity},
+    )
+
+Classify the nodes in a tree as constant, linear, or nonlinear with respect to
+the input.
+"""
+function _classify_linearity(
+    nodes::Vector{Nonlinear.Node},
+    adj::SparseArrays.SparseMatrixCSC,
+    subexpression_linearity::Vector{Linearity},
+)
+    linearity = Array{Linearity}(undef, length(nodes))
+    children_arr = SparseArrays.rowvals(adj)
+    for k in length(nodes):-1:1
+        node = nodes[k]
+        if node.type == Nonlinear.NODE_VARIABLE
+            linearity[k] = LINEAR
+            continue
+        elseif node.type == Nonlinear.NODE_VALUE
+            linearity[k] = CONSTANT
+            continue
+        elseif node.type == Nonlinear.NODE_PARAMETER
+            linearity[k] = CONSTANT
+            continue
+        elseif node.type == Nonlinear.NODE_SUBEXPRESSION
+            linearity[k] = subexpression_linearity[node.index]
+            continue
+        end
+        children_idx = SparseArrays.nzrange(adj, k)
+        num_constant_children, any_nonlinear = 0, false
+        for r in children_idx
+            if linearity[children_arr[r]] == NONLINEAR
+                any_nonlinear = true
+                break
+            elseif linearity[children_arr[r]] == CONSTANT
+                num_constant_children += 1
+            end
+        end
+        if any_nonlinear
+            # If any children are nonlinear, then we're nonlinear...
+            linearity[k] = NONLINEAR
+            # ...except in the case of ifelse. If the operands are linear then
+            # we're piecewise linear.
+            op = get(
+                Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS,
+                node.index,
+                nothing,
+            )
+            if (
+                node.type == Nonlinear.NODE_CALL_MULTIVARIATE &&
+                op == :ifelse &&
+                linearity[children_arr[children_idx[2]]] == LINEAR &&
+                linearity[children_arr[children_idx[3]]] == LINEAR
+            )
+                linearity[k] = PIECEWISE_LINEAR
+            end
+            continue
+        elseif num_constant_children == length(children_idx)
+            # If all children are constant, then we're constant.
+            linearity[k] = CONSTANT
+            continue
+        end
+        # By this point, some children are constant and some are linear, so if
+        # the operator is nonlinear, then we're nonlinear.
+        if node.type == Nonlinear.NODE_CALL_UNIVARIATE
+            op =
+                get(Nonlinear.DEFAULT_UNIVARIATE_OPERATORS, node.index, nothing)
+            if op == :+ || op == :-
+                linearity[k] = LINEAR
+            else
+                linearity[k] = NONLINEAR
+            end
+        elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE
+            op = get(
+                Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS,
+                node.index,
+                nothing,
+            )
+            if op == :+
+                linearity[k] = LINEAR
+            elseif op == :-
+                linearity[k] = LINEAR
+            elseif op == :*
+                # Multiplication is linear if there is one non-constant term.
+                one_op = num_constant_children == length(children_idx) - 1
+                linearity[k] = one_op ? LINEAR : NONLINEAR
+            elseif op == :^
+                linearity[k] = NONLINEAR
+            elseif op == :/
+                if linearity[children_arr[children_idx[2]]] == CONSTANT
+                    # If the denominator is constant, we're linear.
+                    linearity[k] = LINEAR
+                else
+                    linearity[k] = NONLINEAR
+                end
+            elseif op == :ifelse
+                linearity[k] = NONLINEAR
+            else  # User-defined functions
+                linearity[k] = NONLINEAR
+            end
+        elseif node.type == Nonlinear.NODE_LOGIC
+            linearity[k] = NONLINEAR
+        else
+            @assert node.type == Nonlinear.NODE_COMPARISON
+            linearity[k] = NONLINEAR
+        end
+    end
+    return linearity
+end
+
+"""
+    _compute_gradient_sparsity!(
+        indices::Coloring.IndexedSet,
+        nodes::Vector{Nonlinear.Node},
+    )
+
+Compute the sparsity pattern of the gradient of an expression (i.e., a list of
+which variable indices are present).
+"""
+function _compute_gradient_sparsity!(
+    indices::Coloring.IndexedSet,
+    nodes::Vector{Nonlinear.Node},
+)
+    for node in nodes
+        if node.type == Nonlinear.NODE_VARIABLE
+            push!(indices, node.index)
+        elseif node.type == Nonlinear.NODE_MOI_VARIABLE
+            error(
+                "Internal error: Invalid to compute sparsity if Nonlinear.NODE_MOI_VARIABLE " *
+                "nodes are present.",
+            )
+        end
+    end
+    return
+end
+
+"""
+    _compute_hessian_sparsity(
+        nodes::Vector{Nonlinear.Node},
+        adj,
+        input_linearity::Vector{Linearity},
+        indexedset::Coloring.IndexedSet,
+        subexpression_edgelist::Vector{Set{Tuple{Int,Int}}},
+        subexpression_variables::Vector{Vector{Int}},
+    )
+
+Compute the sparsity pattern the Hessian of an expression.
+
+ * `input_linearity` is the linearity with respect to the input, computed by
+   `_classify_linearity`
+ * `subexpression_edgelist` is the edge_list of each subexpression
+ * `subexpression_variables` is the list of all variables which appear in a
+   subexpression (including recursively).
+
+Idea: consider the (non)linearity of a node *with respect to the output*. The
+children of any node which is nonlinear with respect to the output should have
+nonlinear interactions, hence nonzeros in the hessian. This is not true in
+general, but holds for everything we consider.
+
+A counter example is `f(x, y, z) = x + y * z`, but we don't have any functions
+like that. By "nonlinear with respect to the output", we mean that the output
+depends nonlinearly on the value of the node, regardless of how the node itself
+depends on the input.
+"""
+function _compute_hessian_sparsity(
+    nodes::Vector{Nonlinear.Node},
+    adj,
+    input_linearity::Vector{Linearity},
+    indexedset::Coloring.IndexedSet,
+    subexpression_edgelist::Vector{Set{Tuple{Int,Int}}},
+    subexpression_variables::Vector{Vector{Int}},
+)
+    # So start at the root of the tree and classify the linearity wrt the output.
+    # For each nonlinear node, do a mini DFS and collect the list of children.
+    # Add a nonlinear interaction between all children of a nonlinear node.
+    edge_list = Set{Tuple{Int,Int}}()
+    nonlinear_wrt_output = fill(false, length(nodes))
+    children_arr = SparseArrays.rowvals(adj)
+    stack = Int[]
+    stack_ignore = Bool[]
+    nonlinear_group = indexedset
+    if length(nodes) == 1 && nodes[1].type == Nonlinear.NODE_SUBEXPRESSION
+        # Subexpression comes in linearly, so append edge_list
+        for ij in subexpression_edgelist[nodes[1].index]
+            push!(edge_list, ij)
+        end
+    end
+    for k in 2:length(nodes)
+        nod = nodes[k]
+        if nod.type == Nonlinear.NODE_MOI_VARIABLE
+            error(
+                "Internal error: Invalid to compute sparsity if Nonlinear.NODE_MOI_VARIABLE " *
+                "nodes are present.",
+            )
+        end
+        if nonlinear_wrt_output[k]
+            continue # already seen this node one way or another
+        elseif input_linearity[k] == CONSTANT
+            continue # definitely not nonlinear
+        end
+        @assert !nonlinear_wrt_output[nod.parent]
+        # check if the parent depends nonlinearly on the value of this node
+        par = nodes[nod.parent]
+        if par.type == Nonlinear.NODE_CALL_UNIVARIATE
+            op = get(Nonlinear.DEFAULT_UNIVARIATE_OPERATORS, par.index, nothing)
+            if op === nothing || (op != :+ && op != :-)
+                nonlinear_wrt_output[k] = true
+            end
+        elseif par.type == Nonlinear.NODE_CALL_MULTIVARIATE
+            op = get(
+                Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS,
+                par.index,
+                nothing,
+            )
+            if op === nothing
+                nonlinear_wrt_output[k] = true
+            elseif op in (:+, :-, :ifelse)
+                # pass
+            elseif op == :*
+                # check if all siblings are constant
+                sibling_idx = SparseArrays.nzrange(adj, nod.parent)
+                if !all(
+                    i ->
+                        input_linearity[children_arr[i]] == CONSTANT ||
+                            children_arr[i] == k,
+                    sibling_idx,
+                )
+                    # at least one sibling isn't constant
+                    nonlinear_wrt_output[k] = true
+                end
+            elseif op == :/
+                # check if denominator is nonconstant
+                sibling_idx = SparseArrays.nzrange(adj, nod.parent)
+                if input_linearity[children_arr[last(sibling_idx)]] != CONSTANT
+                    nonlinear_wrt_output[k] = true
+                end
+            else
+                nonlinear_wrt_output[k] = true
+            end
+        end
+        if nod.type == Nonlinear.NODE_SUBEXPRESSION && !nonlinear_wrt_output[k]
+            # subexpression comes in linearly, so append edge_list
+            for ij in subexpression_edgelist[nod.index]
+                push!(edge_list, ij)
+            end
+        end
+        if !nonlinear_wrt_output[k]
+            continue
+        end
+        # do a DFS from here, including all children
+        @assert isempty(stack)
+        @assert isempty(stack_ignore)
+        sibling_idx = SparseArrays.nzrange(adj, nod.parent)
+        for sidx in sibling_idx
+            push!(stack, children_arr[sidx])
+            push!(stack_ignore, false)
+        end
+        empty!(nonlinear_group)
+        while length(stack) > 0
+            r = pop!(stack)
+            should_ignore = pop!(stack_ignore)
+            nonlinear_wrt_output[r] = true
+            if nodes[r].type == Nonlinear.NODE_LOGIC ||
+               nodes[r].type == Nonlinear.NODE_COMPARISON
+                # don't count the nonlinear interactions inside
+                # logical conditions or comparisons
+                should_ignore = true
+            end
+            children_idx = SparseArrays.nzrange(adj, r)
+            for cidx in children_idx
+                push!(stack, children_arr[cidx])
+                push!(stack_ignore, should_ignore)
+            end
+            if should_ignore
+                continue
+            end
+            if nodes[r].type == Nonlinear.NODE_VARIABLE
+                push!(nonlinear_group, nodes[r].index)
+            elseif nodes[r].type == Nonlinear.NODE_SUBEXPRESSION
+                # append all variables in subexpression
+                union!(nonlinear_group, subexpression_variables[nodes[r].index])
+            end
+        end
+        for i_ in 1:nonlinear_group.nnz
+            i = nonlinear_group.nzidx[i_]
+            for j_ in 1:nonlinear_group.nnz
+                j = nonlinear_group.nzidx[j_]
+                if j > i
+                    continue  # Only lower triangle.
+                end
+                push!(edge_list, (i, j))
+            end
+        end
+    end
+    return edge_list
+end
+
+"""
+    _list_subexpressions(nodes::Vector{Nonlinear.Node})
+
+Returns the list of subexpressions which a given tape depends on directly
+"""
+function _list_subexpressions(nodes::Vector{Nonlinear.Node})
+    indices = Set{Int}(
+        n.index for n in nodes if n.type == Nonlinear.NODE_SUBEXPRESSION
+    )
+    return sort(collect(indices))
+end
+
+"""
+    _topological_sort(
+        starts::Vector{Int},
+        subexpressions::Vector{Vector{Nonlinear.Node}},
+        subexpression_dependency_graph::Vector{Vector{Int}} =
+            Vector{Vector{Int}}(undef, length(subexpressions)),
+    )
+
+Return a topologically sorted list of the integer subexpresssion indices that
+need to be computed to evaluate `subexpressions[s]` for all `s in starts`.
+
+`starts` should be ordered, and not contain duplicates.
+
+`subexpression_dependency_graph[i]` is a lazily-computed list of "out" edges
+from node `i`, in terms of the integer-valued subexpression index (i.e.,
+`node.index`). This list should be unique and ordered.
+
+If calling `_topological_sort` a single time, you may omit the
+`subexpression_dependency_graph` argument.
+
+However, if calling `_topological_sort` multiple times on the _same_ vector of
+subexpresssions, you should create `subexpression_dependency_graph` once (either
+as the uninitialized vector, or by explicitly computing the full
+`subexpression_dependency_graph`), and pass it in.
+
+## Notes
+
+* It is important to not use recursion here, because expressions may have
+  arbitrary levels of nesting!
+* This function assumes `subexpressions` is acyclic.
+"""
+function _topological_sort(
+    starts,
+    subexpressions::Vector{Vector{Nonlinear.Node}},
+    subexpression_dependency_graph::Vector{Vector{Int}} = Vector{Vector{Int}}(
+        undef,
+        length(subexpressions),
+    ),
+)
+    ordered = Int[]
+    in_order = fill(false, length(subexpressions))
+    stack = Tuple{Int,Bool}[]
+    for s in starts
+        if in_order[s]
+            continue  # s is already in `ordered`.
+        end
+        push!(stack, (s, true))
+        while !isempty(stack)
+            node, needs_checking = pop!(stack)
+            if !needs_checking
+                # We must be returning to this node for a second time, and we
+                # have already checked all of the children. Therefore, we can
+                # add it to the set of ordered nodes.
+                push!(ordered, node)
+                in_order[node] = true
+                continue
+            elseif in_order[node]
+                continue  # This node has already been added to `ordered`.
+            end
+            # Re-add the node to the stack, but set the `false` flag this time
+            # so next time we visit it, it will go on the `ordered` list
+            # instead.
+            push!(stack, (node, false))
+            if !isassigned(subexpression_dependency_graph, node)
+                subexpression_dependency_graph[node] =
+                    _list_subexpressions(subexpressions[node])
+            end
+            for child in subexpression_dependency_graph[node]
+                if !in_order[child]
+                    push!(stack, (child, true))
+                end
+            end
+        end
+    end
+    return ordered
+end
+
+"""
+    _order_subexpressions(
+        main_expressions::Vector{Vector{Nonlinear.Node}},
+        subexpressions::Vector{Vector{Nonlinear.Node}};
+    )
+
+Topologically sort the subexpression needed to evaluate `main_expressions`.
+
+Returns two things:
+
+ * A `Vector{Int}` containing the ordered list of subexpression-indices that
+   need to be evaluated to compute all `main_expressions`
+ * A `Vector{Vector{Int}}`, containing a list of ordered lists of
+   subexpression-indices that need to be evaluated to compute
+   `main_expressions[i]`.
+
+**Warning:** This doesn't handle cyclic expressions! But this should be fine
+because we can't compute them in JuMP anyway.
+"""
+function _order_subexpressions(
+    main_expressions::Vector{Vector{Nonlinear.Node}},
+    subexpressions::Vector{Vector{Nonlinear.Node}},
+)
+    # The graph of node dependencies. Constructed lazily.
+    subexpression_dependency_graph =
+        Vector{Vector{Int}}(undef, length(subexpressions))
+    # Node dependencies of the main expressions.
+    starts = Set{Int}()
+    individual_sorts = Vector{Int}[]
+    for expression in main_expressions
+        s = _list_subexpressions(expression)
+        union!(starts, s)
+        push!(
+            individual_sorts,
+            _topological_sort(
+                s,
+                subexpressions,
+                subexpression_dependency_graph,
+            ),
+        )
+    end
+    full_sort = _topological_sort(
+        starts,
+        subexpressions,
+        subexpression_dependency_graph,
+    )
+    return full_sort, individual_sorts
+end
+
+"""
+    _order_subexpressions(
+        main_expression::Vector{Nonlinear.Node},
+        subexpressions::Vector{Vector{Nonlinear.Node}},
+    )
+
+Return a `Vector{Int}` containing the topologically ordered list of
+subexpression-indices that need to be evaluated to compute `main_expression`.
+"""
+function _order_subexpressions(
+    main_expression::Vector{Nonlinear.Node},
+    subexpressions::Vector{Vector{Nonlinear.Node}},
+)
+    s = _list_subexpressions(main_expression)
+    return _topological_sort(s, subexpressions)
+end
diff --git a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl
new file mode 100644
index 0000000000..7adee13d3f
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl
@@ -0,0 +1,355 @@
+function MOI.features_available(d::NLPEvaluator)
+    # Check if we have any user-defined multivariate operators, in which case we
+    # need to disable hessians. The result of features_available depends on this.
+    d.disable_2ndorder =
+        length(d.data.operators.registered_multivariate_operators) > 0
+    if d.disable_2ndorder
+        return [:Grad, :Jac, :JacVec]
+    end
+    return [:Grad, :Jac, :JacVec, :Hess, :HessVec]
+end
+
+function MOI.initialize(d::NLPEvaluator, requested_features::Vector{Symbol})
+    # Check that we support the features requested by the user.
+    available_features = MOI.features_available(d)
+    for feature in requested_features
+        if !(feature in available_features)
+            error("Unsupported feature $feature")
+        end
+    end
+    moi_index_to_consecutive_index =
+        Dict(x => i for (i, x) in enumerate(d.ordered_variables))
+    N = length(moi_index_to_consecutive_index)
+    #
+    largest_user_input_dimension = 1
+    for op in d.data.operators.registered_multivariate_operators
+        largest_user_input_dimension = max(largest_user_input_dimension, op.N)
+    end
+    d.objective = nothing
+    d.user_output_buffer = zeros(largest_user_input_dimension)
+    d.jac_storage = zeros(max(N, largest_user_input_dimension))
+    d.constraints = _FunctionStorage[]
+    d.last_x = fill(NaN, N)
+    d.want_hess = :Hess in requested_features
+    want_hess_storage = (:HessVec in requested_features) || d.want_hess
+    coloring_storage = Coloring.IndexedSet(N)
+    max_expr_length = 0
+    #
+    main_expressions = [c.expression.nodes for (_, c) in d.data.constraints]
+    if d.data.objective !== nothing
+        pushfirst!(main_expressions, d.data.objective.nodes)
+    end
+    d.subexpression_order, individual_order = _order_subexpressions(
+        main_expressions,
+        [expr.nodes for expr in d.data.expressions],
+    )
+    num_subexpressions = length(d.data.expressions)
+    d.subexpression_linearity = Vector{Linearity}(undef, num_subexpressions)
+    subexpression_variables = Vector{Vector{Int}}(undef, num_subexpressions)
+    subexpression_edgelist =
+        Vector{Set{Tuple{Int,Int}}}(undef, num_subexpressions)
+    d.subexpressions = Vector{_SubexpressionStorage}(undef, num_subexpressions)
+    d.subexpression_forward_values = zeros(num_subexpressions)
+    d.subexpression_reverse_values = zeros(num_subexpressions)
+    for k in d.subexpression_order
+        # Only load expressions which actually are used
+        d.subexpression_forward_values[k] = NaN
+        subex = _SubexpressionStorage(
+            d.data.expressions[k],
+            d.subexpression_linearity,
+            moi_index_to_consecutive_index,
+        )
+        d.subexpressions[k] = subex
+        d.subexpression_linearity[k] = subex.linearity
+        if d.want_hess
+            empty!(coloring_storage)
+            _compute_gradient_sparsity!(coloring_storage, subex.nodes)
+            # union with all dependent expressions
+            for idx in _list_subexpressions(subex.nodes)
+                union!(coloring_storage, subexpression_variables[idx])
+            end
+            subexpression_variables[k] = collect(coloring_storage)
+            empty!(coloring_storage)
+            linearity = _classify_linearity(
+                subex.nodes,
+                subex.adj,
+                d.subexpression_linearity,
+            )
+            edgelist = _compute_hessian_sparsity(
+                subex.nodes,
+                subex.adj,
+                linearity,
+                coloring_storage,
+                subexpression_edgelist,
+                subexpression_variables,
+            )
+            subexpression_edgelist[k] = edgelist
+        end
+    end
+    max_chunk = 1
+    if d.data.objective !== nothing
+        d.objective = _FunctionStorage(
+            main_expressions[1],
+            d.data.objective.values,
+            N,
+            coloring_storage,
+            d.want_hess,
+            d.subexpressions,
+            individual_order[1],
+            d.subexpression_linearity,
+            subexpression_edgelist,
+            subexpression_variables,
+            moi_index_to_consecutive_index,
+        )
+        max_expr_length = max(max_expr_length, length(d.objective.nodes))
+        max_chunk = max(max_chunk, size(d.objective.seed_matrix, 2))
+    end
+    for (k, (_, constraint)) in enumerate(d.data.constraints)
+        idx = d.data.objective !== nothing ? k + 1 : k
+        push!(
+            d.constraints,
+            _FunctionStorage(
+                main_expressions[idx],
+                constraint.expression.values,
+                N,
+                coloring_storage,
+                d.want_hess,
+                d.subexpressions,
+                individual_order[idx],
+                d.subexpression_linearity,
+                subexpression_edgelist,
+                subexpression_variables,
+                moi_index_to_consecutive_index,
+            ),
+        )
+        max_expr_length = max(max_expr_length, length(d.constraints[end].nodes))
+        max_chunk = max(max_chunk, size(d.constraints[end].seed_matrix, 2))
+    end
+    # 10 is hardcoded upper bound to avoid excess memory allocation
+    max_chunk = min(max_chunk, 10)
+    if d.want_hess || want_hess_storage
+        d.input_ϵ = zeros(max_chunk * N)
+        d.output_ϵ = zeros(max_chunk * N)
+        #
+        len = max_chunk * max_expr_length
+        d.forward_storage_ϵ = zeros(len)
+        d.partials_storage_ϵ = zeros(len)
+        d.reverse_storage_ϵ = zeros(len)
+        #
+        len = max_chunk * length(d.subexpressions)
+        d.subexpression_forward_values_ϵ = zeros(len)
+        d.subexpression_reverse_values_ϵ = zeros(len)
+        #
+        for k in d.subexpression_order
+            len = max_chunk * length(d.subexpressions[k].nodes)
+            d.subexpressions[k].forward_storage_ϵ = zeros(len)
+            d.subexpressions[k].partials_storage_ϵ = zeros(len)
+            d.subexpressions[k].reverse_storage_ϵ = zeros(len)
+        end
+        d.max_chunk = max_chunk
+        if d.want_hess
+            _compute_hessian_lagrangian_structure(d)
+        end
+    end
+    return
+end
+
+function MOI.eval_objective(d::NLPEvaluator, x)
+    if d.objective === nothing
+        error("No nonlinear objective.")
+    end
+    _reverse_mode(d, x)
+    return d.objective.forward_storage[1]
+end
+
+function MOI.eval_objective_gradient(d::NLPEvaluator, g, x)
+    if d.objective === nothing
+        error("No nonlinear objective.")
+    end
+    _reverse_mode(d, x)
+    fill!(g, 0.0)
+    _extract_reverse_pass(g, d, d.objective)
+    return
+end
+
+function MOI.eval_constraint(d::NLPEvaluator, g, x)
+    _reverse_mode(d, x)
+    for i in 1:length(d.constraints)
+        g[i] = d.constraints[i].forward_storage[1]
+    end
+    return
+end
+
+function MOI.jacobian_structure(d::NLPEvaluator)
+    J = Tuple{Int64,Int64}[]
+    for (row, constraint) in enumerate(d.constraints)
+        for col in constraint.grad_sparsity
+            push!(J, (row, col))
+        end
+    end
+    return J
+end
+
+function MOI.eval_constraint_jacobian(d::NLPEvaluator, J, x)
+    _reverse_mode(d, x)
+    fill!(J, 0.0)
+    offset = 0
+    for ex in d.constraints
+        for i in ex.grad_sparsity
+            d.jac_storage[i] = 0.0
+        end
+        _extract_reverse_pass(d.jac_storage, d, ex)
+        for (k, idx) in enumerate(ex.grad_sparsity)
+            J[offset+k] = d.jac_storage[idx]
+        end
+        offset += length(ex.grad_sparsity)
+    end
+    return
+end
+
+function MOI.eval_constraint_jacobian_product(d::NLPEvaluator, y, x, w)
+    fill!(y, 0.0)
+    J_struct = MOI.jacobian_structure(d)
+    nnz = length(J_struct)
+    J = zeros(nnz)
+    MOI.eval_constraint_jacobian(d, J, x)
+    for (k, (i, j)) in enumerate(J_struct)
+        y[i] += J[k] * w[j]
+    end
+    return
+end
+
+function MOI.eval_constraint_jacobian_transpose_product(
+    d::NLPEvaluator,
+    y::AbstractVector{Float64},
+    x::AbstractVector{Float64},
+    w::AbstractVector{Float64},
+)
+    fill!(y, 0.0)
+    J_struct = MOI.jacobian_structure(d)
+    nnz = length(J_struct)
+    J = zeros(nnz)
+    MOI.eval_constraint_jacobian(d, J, x)
+    for (k, (i, j)) in enumerate(J_struct)
+        y[j] += J[k] * w[i]
+    end
+    return y
+end
+
+function MOI.hessian_lagrangian_structure(d::NLPEvaluator)
+    @assert d.want_hess
+    return d.hessian_sparsity
+end
+
+function _compute_hessian_lagrangian_structure(d::NLPEvaluator)
+    d.hessian_sparsity = Tuple{Int64,Int64}[]
+    if d.objective !== nothing
+        append!(d.hessian_sparsity, zip(d.objective.hess_I, d.objective.hess_J))
+    end
+    for c in d.constraints
+        append!(d.hessian_sparsity, zip(c.hess_I, c.hess_J))
+    end
+    return
+end
+
+function MOI.eval_hessian_lagrangian(d::NLPEvaluator, H, x, σ, μ)
+    @assert d.want_hess
+    _reverse_mode(d, x)
+    fill!(d.input_ϵ, 0.0)
+    offset = 0
+    if d.objective !== nothing
+        offset += _eval_hessian(d, d.objective, H, σ, offset)::Int
+    end
+    for (i, ex) in enumerate(d.constraints)
+        offset += _eval_hessian(d, ex, H, μ[i], offset)::Int
+    end
+    return
+end
+
+function MOI.eval_hessian_lagrangian_product(d::NLPEvaluator, h, x, v, σ, μ)
+    _reverse_mode(d, x)
+    fill!(h, 0.0)
+    T = ForwardDiff.Partials{1,Float64}
+    input_ϵ = reinterpret(T, d.input_ϵ)
+    output_ϵ = reinterpret(T, d.output_ϵ)
+    for i in 1:length(x)
+        input_ϵ[i] = ForwardDiff.Partials((v[i],))
+    end
+    # forward evaluate all subexpressions once
+    subexpr_forward_values_ϵ = reinterpret(T, d.subexpression_forward_values_ϵ)
+    for i in d.subexpression_order
+        subexpr = d.subexpressions[i]
+        subexpr_forward_values_ϵ[i] = _forward_eval_ϵ(
+            subexpr,
+            reinterpret(T, subexpr.forward_storage_ϵ),
+            reinterpret(T, subexpr.partials_storage_ϵ),
+            input_ϵ,
+            subexpr_forward_values_ϵ,
+            d.data.operators,
+        )
+    end
+    # we only need to do one reverse pass through the subexpressions as well
+    subexpr_reverse_values_ϵ = reinterpret(T, d.subexpression_reverse_values_ϵ)
+    fill!(subexpr_reverse_values_ϵ, zero(T))
+    fill!(d.subexpression_reverse_values, 0.0)
+    fill!(d.reverse_storage_ϵ, 0.0)
+    fill!(output_ϵ, zero(T))
+    if d.objective !== nothing
+        _forward_eval_ϵ(
+            d.objective,
+            reinterpret(T, d.forward_storage_ϵ),
+            reinterpret(T, d.partials_storage_ϵ),
+            input_ϵ,
+            subexpr_forward_values_ϵ,
+            d.data.operators,
+        )
+        _reverse_eval_ϵ(
+            output_ϵ,
+            d.objective,
+            reinterpret(T, d.reverse_storage_ϵ),
+            reinterpret(T, d.partials_storage_ϵ),
+            d.subexpression_reverse_values,
+            subexpr_reverse_values_ϵ,
+            σ,
+            zero(T),
+        )
+    end
+    for (i, con) in enumerate(d.constraints)
+        _forward_eval_ϵ(
+            con,
+            reinterpret(T, d.forward_storage_ϵ),
+            reinterpret(T, d.partials_storage_ϵ),
+            input_ϵ,
+            subexpr_forward_values_ϵ,
+            d.data.operators,
+        )
+        _reverse_eval_ϵ(
+            output_ϵ,
+            con,
+            reinterpret(T, d.reverse_storage_ϵ),
+            reinterpret(T, d.partials_storage_ϵ),
+            d.subexpression_reverse_values,
+            subexpr_reverse_values_ϵ,
+            μ[i],
+            zero(T),
+        )
+    end
+    for i in length(d.subexpression_order):-1:1
+        j = d.subexpression_order[i]
+        subexpr = d.subexpressions[j]
+        _reverse_eval_ϵ(
+            output_ϵ,
+            subexpr,
+            reinterpret(T, subexpr.reverse_storage_ϵ),
+            reinterpret(T, subexpr.partials_storage_ϵ),
+            d.subexpression_reverse_values,
+            subexpr_reverse_values_ϵ,
+            d.subexpression_reverse_values[j],
+            subexpr_reverse_values_ϵ[j],
+        )
+    end
+    for i in 1:length(x)
+        h[i] += output_ϵ[i].values[1]
+    end
+    return
+end
diff --git a/src/Nonlinear/ReverseAD/precompile.jl b/src/Nonlinear/ReverseAD/precompile.jl
new file mode 100644
index 0000000000..16915364f8
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/precompile.jl
@@ -0,0 +1,46 @@
+function _precompile_()
+    ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
+    Base.precompile(
+        Tuple{
+            typeof(MOI.eval_hessian_lagrangian),
+            NLPEvaluator,
+            SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true},
+            Vector{Float64},
+            Float64,
+            SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true},
+        },
+    )
+    Base.precompile(
+        Tuple{
+            typeof(MOI.eval_objective_gradient),
+            NLPEvaluator,
+            Vector{Float64},
+            Vector{Float64},
+        },
+    )
+    Base.precompile(Tuple{typeof(MOI.initialize),NLPEvaluator,Vector{Symbol}})
+    Base.precompile(Tuple{typeof(MOI.features_available),NLPEvaluator})
+    Base.precompile(
+        Tuple{
+            typeof(MOI.eval_constraint_jacobian),
+            NLPEvaluator,
+            SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true},
+            Vector{Float64},
+        },
+    )
+    Base.precompile(
+        Tuple{
+            typeof(MOI.eval_constraint),
+            NLPEvaluator,
+            SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true},
+            Vector{Float64},
+        },
+    )
+    Base.precompile(Tuple{typeof(MOI.jacobian_structure),NLPEvaluator})
+    Base.precompile(
+        Tuple{typeof(MOI.eval_objective),NLPEvaluator,Vector{Float64}},
+    )
+    return
+end
+
+_precompile_()
diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl
new file mode 100644
index 0000000000..4c09c8b11e
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/reverse_mode.jl
@@ -0,0 +1,259 @@
+"""
+    _reverse_mode(d::NLPEvaluator, x)
+
+Run reverse-mode automatic differentiation on `d` given the primal solution `x`.
+
+This function updates many of the data-structures inside `d` in-place.
+
+At a high level, reverse-mode AD has two phases:
+
+In Phase I, we evalute the problem in `d` at the primal solution `x`, and
+stores the primal solution of each expression in the tree and the first-order
+partial derivative information for each node with respect to its arguments.
+
+Because the nodes in our data structure are topologically sorted, we can make a
+single pass through the tree by iterating backwards through the vector of stored
+nodes.
+
+In Phase II, we propagate the partial derivative information back down the tree
+to find the derivative of each function with respect to the input.
+
+Because the nodes in our data structure are topologically sorted, we can make a
+single pass through the tree by iterating forwards through the vector of stored
+nodes.
+"""
+function _reverse_mode(d::NLPEvaluator, x)
+    if d.last_x == x
+        # Fail fast if the primal solution has not changed since last call.
+        return
+    end
+    # Phase I
+    for k in d.subexpression_order
+        d.subexpression_forward_values[k] =
+            _forward_eval(d.subexpressions[k], d, x)
+    end
+    if d.objective !== nothing
+        _forward_eval(d.objective::_FunctionStorage, d, x)
+    end
+    for con in d.constraints
+        _forward_eval(con, d, x)
+    end
+    # Phase II
+    for k in d.subexpression_order
+        _reverse_eval(d.subexpressions[k])
+    end
+    if d.objective !== nothing
+        _reverse_eval(d.objective::_FunctionStorage)
+    end
+    for con in d.constraints
+        _reverse_eval(con)
+    end
+    copyto!(d.last_x, x)
+    return
+end
+
+"""
+    _forward_eval(
+        f::Union{_FunctionStorage,_SubexpressionStorage},
+        d::NLPEvaluator,
+        x::AbstractVector{T},
+    ) where {T}
+
+Forward-mode evaluation of an expression tree given in `f`.
+
+ * This function assumes that the values of all sub-expressions have already
+   been computed and are stored in `d.subexpression_forward_values`.
+ * `f.partials_storage[k]` is the partial derivative of `nodes[k].parent` with
+   respect to the value of node `k`. It's efficient to compute this at the same
+   time as the value of the parent because we use it in reverse mode and in dual
+   forward mode. Note that `partials_storage`` makes a subtle assumption that we
+   have a tree instead of a general DAG. If we have a DAG, then need to
+   associate storage with each edge of the DAG.
+"""
+function _forward_eval(
+    # !!! warning
+    #     This Union depends upon _FunctionStorage and _SubexpressionStorage
+    #     having similarly named fields.
+    f::Union{_FunctionStorage,_SubexpressionStorage},
+    d::NLPEvaluator,
+    x::AbstractVector{T},
+)::T where {T}
+    @assert length(f.forward_storage) >= length(f.nodes)
+    @assert length(f.partials_storage) >= length(f.nodes)
+    operators = d.data.operators
+    # f.nodes is already in order such that parents always appear before
+    # children, so a backwards pass through f.nodes is a forward pass through
+    # the tree.
+    children_arr = SparseArrays.rowvals(f.adj)
+    for k in length(f.nodes):-1:1
+        node = f.nodes[k]
+        f.partials_storage[k] = zero(T)
+        if node.type == Nonlinear.NODE_VARIABLE
+            f.forward_storage[k] = x[node.index]
+            # This should never happen, because we will have replaced these by now.
+            # elseif node.type == Nonlinear.NODE_MOI_VARIABLE
+            #     f.forward_storage[k] = x[node.index]
+        elseif node.type == Nonlinear.NODE_VALUE
+            f.forward_storage[k] = f.const_values[node.index]
+        elseif node.type == Nonlinear.NODE_SUBEXPRESSION
+            f.forward_storage[k] = d.subexpression_forward_values[node.index]
+        elseif node.type == Nonlinear.NODE_PARAMETER
+            f.forward_storage[k] = d.data.parameters[node.index]
+        elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE
+            children_indices = SparseArrays.nzrange(f.adj, k)
+            N = length(children_indices)
+            f_input = _UnsafeVectorView(d.jac_storage, N)
+            ∇f = _UnsafeVectorView(d.user_output_buffer, N)
+            for (r, i) in enumerate(children_indices)
+                f_input[r] = f.forward_storage[children_arr[i]]
+                ∇f[r] = 0.0
+            end
+            f.forward_storage[k] = Nonlinear.eval_multivariate_function(
+                operators,
+                operators.multivariate_operators[node.index],
+                f_input,
+            )
+            Nonlinear.eval_multivariate_gradient(
+                operators,
+                operators.multivariate_operators[node.index],
+                ∇f,
+                f_input,
+            )
+            for (r, i) in enumerate(children_indices)
+                f.partials_storage[children_arr[i]] = ∇f[r]
+            end
+        elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE
+            child_idx = children_arr[f.adj.colptr[k]]
+            f.forward_storage[k] = Nonlinear.eval_univariate_function(
+                operators,
+                operators.univariate_operators[node.index],
+                f.forward_storage[child_idx],
+            )
+            f.partials_storage[child_idx] = Nonlinear.eval_univariate_gradient(
+                operators,
+                operators.univariate_operators[node.index],
+                f.forward_storage[child_idx],
+            )
+        elseif node.type == Nonlinear.NODE_COMPARISON
+            children_idx = SparseArrays.nzrange(f.adj, k)
+            result = true
+            f.partials_storage[children_arr[children_idx[1]]] = zero(T)
+            for r in 2:length(children_idx)
+                lhs = children_arr[children_idx[r-1]]
+                rhs = children_arr[children_idx[r]]
+                result &= Nonlinear.eval_comparison_function(
+                    operators,
+                    operators.comparison_operators[node.index],
+                    f.forward_storage[lhs],
+                    f.forward_storage[rhs],
+                )
+                f.partials_storage[rhs] = zero(T)
+            end
+            f.forward_storage[k] = result
+        else
+            @assert node.type == Nonlinear.NODE_LOGIC
+            children_idx = SparseArrays.nzrange(f.adj, k)
+            lhs = children_arr[children_idx[1]]
+            rhs = children_arr[children_idx[2]]
+            f.forward_storage[k] = Nonlinear.eval_logic_function(
+                operators,
+                operators.logic_operators[node.index],
+                f.forward_storage[lhs] == 1,
+                f.forward_storage[rhs] == 1,
+            )
+            f.partials_storage[lhs] = zero(T)
+            f.partials_storage[rhs] = zero(T)
+        end
+    end
+    return f.forward_storage[1]
+end
+
+"""
+    _reverse_eval(f::Union{_FunctionStorage,_SubexpressionStorage})
+
+Reverse-mode evaluation of an expression tree given in `f`.
+
+ * This function assumes `f.partials_storage` is already updated.
+ * This function assumes that `f.reverse_storage` has been initialized with 0.0.
+"""
+function _reverse_eval(
+    # !!! warning
+    #     This Union depends upon _FunctionStorage and _SubexpressionStorage
+    #     having similarly named fields.
+    f::Union{_FunctionStorage,_SubexpressionStorage},
+)
+    @assert length(f.reverse_storage) >= length(f.nodes)
+    @assert length(f.partials_storage) >= length(f.nodes)
+    # f.nodes is already in order such that parents always appear before
+    # children so a forward pass through nodes is a backwards pass through the
+    # tree.
+    f.reverse_storage[1] = one(Float64)
+    for k in 2:length(f.nodes)
+        node = f.nodes[k]
+        if node.type == Nonlinear.NODE_VALUE ||
+           node.type == Nonlinear.NODE_LOGIC ||
+           node.type == Nonlinear.NODE_COMPARISON ||
+           node.type == Nonlinear.NODE_PARAMETER
+            continue
+        end
+        rev_parent = f.reverse_storage[node.parent]
+        partial = f.partials_storage[k]
+        f.reverse_storage[k] = ifelse(
+            rev_parent == 0.0 && !isfinite(partial),
+            rev_parent,
+            rev_parent * partial,
+        )
+    end
+    return
+end
+
+"""
+    _extract_reverse_pass(
+        g::Vector{T},
+        d::NLPEvaluator,
+        f::Union{_FunctionStorage,_SubexpressionStorage},
+    ) where {T}
+
+Fill the gradient vector `g` with the values from the reverse pass. Assumes you
+have already called `_reverse_eval_all(d, x)`.
+"""
+function _extract_reverse_pass(
+    g::Vector{T},
+    d::NLPEvaluator,
+    f::Union{_FunctionStorage,_SubexpressionStorage},
+) where {T}
+    for i in f.dependent_subexpressions
+        d.subexpression_reverse_values[i] = 0.0
+    end
+    _extract_reverse_pass_inner(g, f, d.subexpression_reverse_values, 1.0)
+    for i in length(f.dependent_subexpressions):-1:1
+        k = f.dependent_subexpressions[i]
+        _extract_reverse_pass_inner(
+            g,
+            d.subexpressions[k],
+            d.subexpression_reverse_values,
+            d.subexpression_reverse_values[k],
+        )
+    end
+    return
+end
+
+function _extract_reverse_pass_inner(
+    output::AbstractVector{T},
+    # !!! warning
+    #     This Union depends upon _FunctionStorage and _SubexpressionStorage
+    #     having similarly named fields.
+    f::Union{_FunctionStorage,_SubexpressionStorage},
+    subexpressions::AbstractVector{T},
+    scale::T,
+) where {T}
+    @assert length(f.reverse_storage) >= length(f.nodes)
+    for (k, node) in enumerate(f.nodes)
+        if node.type == Nonlinear.NODE_VARIABLE
+            output[node.index] += scale * f.reverse_storage[k]
+        elseif node.type == Nonlinear.NODE_SUBEXPRESSION
+            subexpressions[node.index] += scale * f.reverse_storage[k]
+        end
+    end
+    return
+end
diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl
new file mode 100644
index 0000000000..348af23859
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/types.jl
@@ -0,0 +1,188 @@
+mutable struct _SubexpressionStorage
+    nodes::Vector{Nonlinear.Node}
+    adj::SparseArrays.SparseMatrixCSC{Bool,Int}
+    const_values::Vector{Float64}
+    forward_storage::Vector{Float64}
+    partials_storage::Vector{Float64}
+    reverse_storage::Vector{Float64}
+    forward_storage_ϵ::Vector{Float64}
+    partials_storage_ϵ::Vector{Float64}
+    reverse_storage_ϵ::Vector{Float64}
+    linearity::Linearity
+
+    function _SubexpressionStorage(
+        expr::Nonlinear.NonlinearExpression,
+        subexpression_linearity,
+        moi_index_to_consecutive_index,
+    )
+        nodes =
+            _replace_moi_variables(expr.nodes, moi_index_to_consecutive_index)
+        adj = Nonlinear.adjacency_matrix(nodes)
+        N = length(nodes)
+        linearity = _classify_linearity(nodes, adj, subexpression_linearity)
+        return new(
+            nodes,
+            adj,
+            expr.values,
+            zeros(N),  # forward_storage,
+            zeros(N),  # partials_storage,
+            zeros(N),  # reverse_storage,
+            Float64[],
+            Float64[],
+            Float64[],
+            linearity[1],
+        )
+    end
+end
+
+mutable struct _FunctionStorage
+    nodes::Vector{Nonlinear.Node}
+    adj::SparseArrays.SparseMatrixCSC{Bool,Int}
+    const_values::Vector{Float64}
+    forward_storage::Vector{Float64}
+    partials_storage::Vector{Float64}
+    reverse_storage::Vector{Float64}
+    grad_sparsity::Vector{Int}
+    # Nonzero pattern of Hessian matrix
+    hess_I::Vector{Int}
+    hess_J::Vector{Int}
+    rinfo::Coloring.RecoveryInfo # coloring info for hessians
+    seed_matrix::Matrix{Float64}
+    linearity::Linearity
+    # subexpressions which this function depends on, ordered for forward pass.
+    dependent_subexpressions::Vector{Int}
+
+    function _FunctionStorage(
+        nodes::Vector{Nonlinear.Node},
+        const_values,
+        num_variables,
+        coloring_storage::Coloring.IndexedSet,
+        want_hess::Bool,
+        subexpressions::Vector{_SubexpressionStorage},
+        dependent_subexpressions,
+        subexpression_linearity,
+        subexpression_edgelist,
+        subexpression_variables,
+        moi_index_to_consecutive_index,
+    )
+        nodes = _replace_moi_variables(nodes, moi_index_to_consecutive_index)
+        adj = Nonlinear.adjacency_matrix(nodes)
+        N = length(nodes)
+        empty!(coloring_storage)
+        _compute_gradient_sparsity!(coloring_storage, nodes)
+        for k in dependent_subexpressions
+            _compute_gradient_sparsity!(
+                coloring_storage,
+                subexpressions[k].nodes,
+            )
+        end
+        grad_sparsity = sort!(collect(coloring_storage))
+        empty!(coloring_storage)
+        if want_hess
+            linearity = _classify_linearity(nodes, adj, subexpression_linearity)
+            edgelist = _compute_hessian_sparsity(
+                nodes,
+                adj,
+                linearity,
+                coloring_storage,
+                subexpression_edgelist,
+                subexpression_variables,
+            )
+            hess_I, hess_J, rinfo = Coloring.hessian_color_preprocess(
+                edgelist,
+                num_variables,
+                coloring_storage,
+            )
+            seed_matrix = Coloring.seed_matrix(rinfo)
+            return new(
+                nodes,
+                adj,
+                const_values,
+                zeros(N),  # forward_storage,
+                zeros(N),  # partials_storage,
+                zeros(N),  # reverse_storage,
+                grad_sparsity,
+                hess_I,
+                hess_J,
+                rinfo,
+                seed_matrix,
+                linearity[1],
+                dependent_subexpressions,
+            )
+        else
+            return new(
+                nodes,
+                adj,
+                const_values,
+                zeros(N),  # forward_storage,
+                zeros(N),  # partials_storage,
+                zeros(N),  # reverse_storage,
+                grad_sparsity,
+                Int[],
+                Int[],
+                Coloring.RecoveryInfo(),
+                Array{Float64}(undef, 0, 0),
+                NONLINEAR,
+                dependent_subexpressions,
+            )
+        end
+    end
+end
+
+"""
+    NLPEvaluator(
+        data::Nonlinear.NonlinearData,
+        ordered_variables::Vector{MOI.VariableIndex},
+    )
+
+Return an `NLPEvaluator` object that implements the `MOI.AbstractNLPEvaluator`
+interface.
+
+!!! warning
+    Before using, you must initialize the evaluator using `MOI.initialize`.
+"""
+mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator
+    data::Nonlinear.NonlinearData
+    ordered_variables::Vector{MOI.VariableIndex}
+
+    objective::Union{Nothing,_FunctionStorage}
+    constraints::Vector{_FunctionStorage}
+    subexpressions::Vector{_SubexpressionStorage}
+    subexpression_order::Vector{Int}
+    # Storage for the subexpressions in reverse-mode automatic differentiation.
+    subexpression_forward_values::Vector{Float64}
+    subexpression_reverse_values::Vector{Float64}
+    subexpression_linearity::Vector{Linearity}
+
+    # A cache of the last x. This is used to guide whether we need to re-run
+    # reverse-mode automatic differentiation.
+    last_x::Vector{Float64}
+
+    # Temporary storage for computing jacobians. This is also used as temporary
+    # storage for the input of multivariate functions.
+    jac_storage::Vector{Float64}
+    # Temporary storage for the gradient of multivariate functions
+    user_output_buffer::Vector{Float64}
+
+    # storage for computing hessians
+    # these Float64 vectors are reinterpreted to hold multiple epsilon components
+    # so the length should be multiplied by the maximum number of epsilon components
+    disable_2ndorder::Bool # don't offer Hess or HessVec
+    want_hess::Bool
+    forward_storage_ϵ::Vector{Float64} # (longest expression)
+    partials_storage_ϵ::Vector{Float64} # (longest expression)
+    reverse_storage_ϵ::Vector{Float64} # (longest expression)
+    input_ϵ::Vector{Float64} # (number of variables)
+    output_ϵ::Vector{Float64}# (number of variables)
+    subexpression_forward_values_ϵ::Vector{Float64} # (number of subexpressions)
+    subexpression_reverse_values_ϵ::Vector{Float64} # (number of subexpressions)
+    hessian_sparsity::Vector{Tuple{Int64,Int64}}
+    max_chunk::Int # chunk size for which we've allocated storage
+
+    function NLPEvaluator(
+        data::Nonlinear.NonlinearData,
+        ordered_variables::Vector{MOI.VariableIndex},
+    )
+        return new(data, ordered_variables)
+    end
+end
diff --git a/src/Nonlinear/ReverseAD/utils.jl b/src/Nonlinear/ReverseAD/utils.jl
new file mode 100644
index 0000000000..ca3ae6432c
--- /dev/null
+++ b/src/Nonlinear/ReverseAD/utils.jl
@@ -0,0 +1,63 @@
+"""
+    _UnsafeVectorView(offset, len, ptr)
+
+Lightweight unsafe view for vectors.
+
+## Motivation
+
+`_UnsafeVectorView` is needed as an allocation-free equivalent of `view`. Other
+alternatives, like `view(x, 1:len)` or a struct like
+```
+struct _SafeView{T}
+    x::Vector{T}
+    len::Int
+end
+```
+will allocate so that `x` can be tracked by Julia's GC.
+
+`_UnsafeVectorView` relies on the fact that the use-cases of `_UnsafeVectorView`
+only temporarily wrap a long-lived vector like `d.jac_storage` so that we don't
+have to worry about the GC removing `d.jac_storage` while `_UnsafeVectorView`
+exists. This lets us use a `Ptr{T}` and create a struct that is `isbitstype` and
+therefore does not allocate.
+
+## Example
+
+Instead of `view(x, 1:10)`, use `_UnsafeVectorView(0, 10, pointer(x))`.
+
+## Unsafe behavior
+
+`_UnsafeVectorView` is unsafe because it assumes that the vector `x` that the
+pointer `ptr` refers to remains valid during the usage of `_UnsafeVectorView`.
+"""
+struct _UnsafeVectorView{T} <: DenseVector{T}
+    offset::Int
+    len::Int
+    ptr::Ptr{T}
+end
+
+Base.getindex(x::_UnsafeVectorView, i) = unsafe_load(x.ptr, i + x.offset)
+
+function Base.setindex!(x::_UnsafeVectorView, value, i)
+    unsafe_store!(x.ptr, value, i + x.offset)
+    return value
+end
+
+Base.length(v::_UnsafeVectorView) = v.len
+
+Base.size(v::_UnsafeVectorView) = (v.len,)
+
+function _UnsafeVectorView(x::Vector, N::Int)
+    if length(x) < N
+        resize!(x, N)
+    end
+    return _UnsafeVectorView(0, N, pointer(x))
+end
+
+function _reinterpret_unsafe(::Type{T}, x::Vector{R}) where {T,R}
+    # how many T's fit into x?
+    @assert isbitstype(T) && isbitstype(R)
+    len = length(x) * sizeof(R)
+    p = reinterpret(Ptr{T}, pointer(x))
+    return _UnsafeVectorView(0, div(len, sizeof(T)), p)
+end
diff --git a/src/Nonlinear/types.jl b/src/Nonlinear/types.jl
index 1564036988..24c056a0ef 100644
--- a/src/Nonlinear/types.jl
+++ b/src/Nonlinear/types.jl
@@ -263,3 +263,21 @@ struct ExprGraphOnly <: AbstractAutomaticDifferentiation end
 function Evaluator(model::Model, ::ExprGraphOnly, ::Vector{MOI.VariableIndex})
     return Evaluator(model)
 end
+
+"""
+    SparseReverseMode() <: AbstractAutomaticDifferentiation
+
+An implementation of `AbstractAutomaticDifferentiation` that uses sparse
+reverse-mode automatic differentiation to compute derivatives. Supports all
+features in the MOI nonlinear interface.
+"""
+struct SparseReverseMode <: AbstractAutomaticDifferentiation end
+
+function set_differentiation_backend(
+    data::NonlinearData,
+    ::SparseReverseMode,
+    ordered_variables::Vector{MOI.VariableIndex},
+)
+    data.inner = ReverseAD.NLPEvaluator(data, ordered_variables)
+    return
+end
diff --git a/test/Nonlinear/Nonlinear.jl b/test/Nonlinear/Nonlinear.jl
index 842e99750e..7f2ca8b595 100644
--- a/test/Nonlinear/Nonlinear.jl
+++ b/test/Nonlinear/Nonlinear.jl
@@ -845,3 +845,5 @@ end
 end
 
 TestNonlinear.runtests()
+
+include("ReverseAD.jl")
diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
new file mode 100644
index 0000000000..1ce0346cc3
--- /dev/null
+++ b/test/Nonlinear/ReverseAD.jl
@@ -0,0 +1,803 @@
+module TestReverseAD
+
+using Test
+import MathOptInterface
+
+const MOI = MathOptInterface
+const Nonlinear = MOI.Nonlinear
+const ReverseAD = Nonlinear.ReverseAD
+const Coloring = ReverseAD.Coloring
+
+function runtests()
+    for name in names(@__MODULE__; all = true)
+        if startswith("$(name)", "test_")
+            @testset "$(name)" begin
+                getfield(@__MODULE__, name)()
+            end
+        end
+    end
+    return
+end
+
+function test_objective_quadratic_univariate()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :($x^2 + 1))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.2]) == 1.2^2 + 1
+    g = [NaN]
+    MOI.eval_objective_gradient(data, g, [1.2])
+    @test g == [2.4]
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1)]
+    H = [NaN]
+    MOI.eval_hessian_lagrangian(data, H, [1.2], 1.5, Float64[])
+    @test H == 1.5 .* [2.0]
+    MOI.eval_hessian_lagrangian_product(data, H, [1.2], [1.2], 1.5, Float64[])
+    @test H == [1.5 * 2.0 * 1.2]
+    return
+end
+
+function test_objective_quadratic_multivariate()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :($x^2 + $x * $y + $y^2))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
+    g = [NaN, NaN]
+    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3]
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    H = [NaN, NaN, NaN]
+    MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[])
+    @test H == 1.5 .* [2.0, 2.0, 1.0]
+    v = [0.3, 0.4]
+    hv = [NaN, NaN]
+    MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[])
+    @test hv ≈ 1.5 .* [2 1; 1 2] * v
+    return
+end
+
+function test_objective_quadratic_multivariate_subexpressions()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    ex = Nonlinear.add_expression(data, :($x^2))
+    ey = Nonlinear.add_expression(data, :($y^2))
+    exy = Nonlinear.add_expression(data, :($ex + $x * $y))
+    Nonlinear.set_objective(data, :($exy + $ey))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
+    g = [NaN, NaN]
+    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3]
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    H = [NaN, NaN, NaN]
+    MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[])
+    @test H == 1.5 .* [2.0, 2.0, 1.0]
+    v = [0.3, 0.4]
+    hv = [NaN, NaN]
+    MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[])
+    @test hv ≈ 1.5 .* [2 1; 1 2] * v
+    return
+end
+
+function test_objective_ifelse_comparison()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(ifelse(1 <= $x <= 2, $x^2, $y^2)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2
+    @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2
+    g = [NaN, NaN]
+    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    @test g == [2 * 1.2, 0.0]
+    MOI.eval_objective_gradient(data, g, [2.2, 2.3])
+    @test g == [0.0, 2 * 2.3]
+    return
+end
+
+function test_objective_ifelse_logic()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2
+    @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2
+    g = [NaN, NaN]
+    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    @test g == [2 * 1.2, 0.0]
+    MOI.eval_objective_gradient(data, g, [2.2, 2.3])
+    @test g == [0.0, 2 * 2.3]
+    return
+end
+
+function test_objective_parameter()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    p = Nonlinear.add_parameter(data, 1.2)
+    Nonlinear.set_objective(data, :($p * $x))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(data, [1.3]) == 1.2 * 1.3
+    g = [NaN]
+    MOI.eval_objective_gradient(data, g, [1.3])
+    @test g == [1.2]
+    return
+end
+
+function test_objective_subexpression()
+    data = Nonlinear.NonlinearData()
+    x = MOI.VariableIndex(1)
+    input = :($x^2 + 1)
+    expr = Nonlinear.add_expression(data, input)
+    expr_2 = Nonlinear.add_expression(data, :($expr^2))
+    Nonlinear.set_objective(data, :($expr_2^2))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x],
+    )
+    MOI.initialize(data, [:Grad])
+    @test MOI.eval_objective(data, [1.3]) == ((1.3^2 + 1)^2)^2
+    g = [NaN]
+    MOI.eval_objective_gradient(data, g, [1.3])
+    @test g ≈ [2 * (1.3^2 + 1)^2 * (2 * (1.3^2 + 1)) * 2 * 1.3]
+    return
+end
+
+function test_constraint_quadratic_univariate()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.add_constraint(data, :($x^2 <= 2.0))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    g = [NaN]
+    x_val = [1.2]
+    MOI.eval_constraint(data, g, x_val)
+    @test g == x_val .^ 2 .- 2
+    @test MOI.jacobian_structure(data) == [(1, 1)]
+    J = [NaN]
+    MOI.eval_constraint_jacobian(data, J, x_val)
+    @test J == 2 .* x_val
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1)]
+    H = [NaN]
+    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    @test H == 1.5 .* [2.0]
+    return
+end
+
+function test_constraint_quadratic_multivariate()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.add_constraint(data, :($x^2 + $x * $y + $y^2 <= 2.0))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    g = [NaN]
+    x_val = [1.2, 2.3]
+    MOI.eval_constraint(data, g, x_val)
+    @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
+    @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)]
+    J = [NaN, NaN]
+    MOI.eval_constraint_jacobian(data, J, x_val)
+    @test J == [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]]
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    H = [NaN, NaN, NaN]
+    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    @test H == 1.5 .* [2.0, 2.0, 1.0]
+    return
+end
+
+function test_constraint_quadratic_multivariate_subexpressions()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    ex = Nonlinear.add_expression(data, :($x^2))
+    ey = Nonlinear.add_expression(data, :($y^2))
+    exy = Nonlinear.add_expression(data, :($ex + $x * $y))
+    Nonlinear.add_constraint(data, :($exy + $ey <= 2.0))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    g = [NaN]
+    x_val = [1.2, 2.3]
+    MOI.eval_constraint(data, g, x_val)
+    @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
+    # Jacobian
+    @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)]
+    J = [NaN, NaN]
+    MOI.eval_constraint_jacobian(data, J, x_val)
+    @test J ≈ [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]]
+    # Jv
+    y, w = [NaN], [1.1, 2.2]
+    MOI.eval_constraint_jacobian_product(data, y, x_val, w)
+    @test y ≈ [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] * w
+    # v'J
+    y, w = [NaN, NaN], [1.1]
+    MOI.eval_constraint_jacobian_transpose_product(data, y, x_val, w)
+    wJ = w' * [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])]
+    @test y ≈ wJ[:]
+    # Hessian-lagrangian
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    H = [NaN, NaN, NaN]
+    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    @test H ≈ 1.5 .* [2.0, 2.0, 1.0]
+    return
+end
+
+function test_hessian_sparsity_registered_function()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    z = MOI.VariableIndex(3)
+    f(x, y) = x^2 + y^2
+    function ∇f(g, x...)
+        g[1] = 2x[1]
+        g[2] = 2x[2]
+        return
+    end
+    function ∇²f(H, x...)
+        H[1, 1] = 2
+        H[2, 2] = 2
+        return
+    end
+    data = Nonlinear.NonlinearData()
+    Nonlinear.register_operator(data, :f, 2, f, ∇f, ∇²f)
+    Nonlinear.set_objective(data, :(f($x, $z) + $y^2))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y, z],
+    )
+    @test_broken :Hess in MOI.features_available(data)
+    # MOI.initialize(data, [:Grad, :Jac, :Hess])
+    # @test MOI.hessian_lagrangian_structure(data) ==
+    #       [(1, 1), (2, 2), (3, 3), (3, 1)]
+    # H = fill(NaN, 4)
+    # MOI.eval_hessian_lagrangian(data, H, rand(3), 1.5, Float64[])
+    # @test H == 1.5 .* [2.0, 2.0, 2.0, 0.0]
+    return
+end
+
+function test_hessian_sparsity_registered_rosenbrock()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    f(x...) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
+    function ∇f(g, x...)
+        g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2
+        g[2] = 200 * (x[2] - x[1]^2)
+        return
+    end
+    function ∇²f(H, x...)
+        H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2
+        H[2, 1] = -400 * x[1]
+        H[2, 2] = 200.0
+        return
+    end
+    data = Nonlinear.NonlinearData()
+    Nonlinear.register_operator(data, :rosenbrock, 2, f, ∇f, ∇²f)
+    Nonlinear.set_objective(data, :(rosenbrock($x, $y)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x, y],
+    )
+    @test_broken :Hess in MOI.features_available(data)
+    # MOI.initialize(data, [:Grad, :Jac, :Hess])
+    # @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    # H = fill(NaN, 3)
+    # MOI.eval_hessian_lagrangian(data, H, [1.0, 1.0], 1.5, Float64[])
+    # @test H == 1.5 .* [802, 200, -400]
+    return
+end
+
+struct _ColoringGraph
+    num_vertices::Int
+    edges::Vector{Tuple{Int,Int}}
+end
+
+function to_adjlist(graph::_ColoringGraph)
+    I = [i for (i, _) in graph.edges]
+    J = [j for (_, j) in graph.edges]
+    return Coloring.UndirectedGraph(I, J, graph.num_vertices)
+end
+
+function test_coloring_edge_free_graph()
+    graph = _ColoringGraph(10, [])
+    _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 1
+    return
+end
+
+function test_coloring_one_edge_graph()
+    graph = _ColoringGraph(10, [(2, 4)])
+    color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 2
+    @test color[2] != color[4]
+    return
+end
+
+function test_coloring_two_edge_graph()
+    graph = _ColoringGraph(10, [(2, 4), (2, 3)])
+    color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 2
+    @test color[3] == color[4]
+    return
+end
+
+function test_coloring_three_edge_graph()
+    graph = _ColoringGraph(10, [(2, 4), (2, 3), (3, 4)])
+    color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 3
+    # TODO: What is this testing?
+    Coloring.recovery_preprocess(to_adjlist(graph), color, numcolors, Int[])
+    return
+end
+
+function test_coloring_two_edge_three_vertex_graph()
+    graph = _ColoringGraph(3, [(1, 3), (2, 3)])
+    _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 2
+    return
+end
+
+function test_coloring_four_edge_four_vertex_graph()
+    graph = _ColoringGraph(4, [(1, 2), (2, 3), (3, 4), (4, 1)])
+    _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph))
+    @test numcolors == 3
+    return
+end
+
+function test_coloring_topological_sort()
+    # graph = _ColoringGraph(6, [(1, 2), (1, 3), (1, 6), (2, 4), (2, 5)])
+    vec = [3, 6, 2, 1, 4, 5, 1, 2, 2, 1]
+    offset = [1, 4, 7, 8, 9, 10, 11]
+    v = Coloring.reverse_topological_sort_by_dfs(vec, offset, 6, zeros(Int, 6))
+    @test v[1] == [3, 6, 4, 5, 2, 1]
+    @test v[2] == [0, 1, 1, 2, 2, 1]
+    return
+end
+
+function test_coloring_end_to_end_hessian_coloring_and_recovery()
+    I, J, rinfo = Coloring.hessian_color_preprocess(Set([(1, 2)]), 2)
+    R = Coloring.seed_matrix(rinfo)
+    Coloring.prepare_seed_matrix!(R, rinfo)
+    @test I == [1, 2, 2]
+    @test J == [1, 2, 1]
+    @test R == [1.0 0.0; 0.0 1.0]
+    hess = [3.4 2.1; 2.1 1.3]
+    matmat = hess * R
+    V = zeros(3)
+    Coloring.recover_from_matmat!(V, matmat, rinfo, zeros(3))
+    @test V == [3.4, 1.3, 2.1]
+    return
+end
+
+function test_derivatives()
+    a = MOI.VariableIndex(1)
+    b = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(sin($a^2) + cos($b * 4) / 5 - 2.0))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [a, b],
+    )
+    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    x = [2.0, 3.0]
+    @test MOI.eval_objective(data, x) == sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0
+    g = [NaN, NaN]
+    MOI.eval_objective_gradient(data, g, x)
+    @test g ≈ [2 * x[1] * cos(x[1]^2), -4 * sin(x[2] * 4) / 5]
+    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2)]
+    H = [NaN, NaN]
+    MOI.eval_hessian_lagrangian(data, H, x, 1.5, Float64[])
+    H_exact = [
+        -4 * x[1]^2 * sin(x[1]^2) + 2 * cos(x[1]^2),
+        -4 / 5 * 4 * cos(x[2] * 4),
+    ]
+    @test H == 1.5 .* H_exact
+    return
+end
+
+function test_NLPBlockData()
+    data = Nonlinear.NonlinearData()
+    x = MOI.VariableIndex(1)
+    Nonlinear.add_constraint(data, :($x <= 1))
+    Nonlinear.add_constraint(data, :($x >= 2))
+    Nonlinear.add_constraint(data, :($x == 3))
+    Nonlinear.add_constraint(data, :(4 <= $x <= 5))
+    block = MOI.NLPBlockData(data, [x])
+    @test block.constraint_bounds == [
+        MOI.NLPBoundsPair(-Inf, 0.0),
+        MOI.NLPBoundsPair(0.0, Inf),
+        MOI.NLPBoundsPair(0.0, 0.0),
+        MOI.NLPBoundsPair(4.0, 5.0),
+    ]
+    @test block.has_objective == false
+    return
+end
+
+function test_linearity()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    z = MOI.VariableIndex(3)
+    variables = Dict(x => 1, y => 2, z => 3)
+    function _test_linearity(input, test_value, IJ = [], indices = [])
+        data = Nonlinear.NonlinearData()
+        ex = Nonlinear.add_expression(data, input)
+        expr = data[ex]
+        adj = Nonlinear.adjacency_matrix(expr.nodes)
+        nodes = ReverseAD._replace_moi_variables(expr.nodes, variables)
+        ret = ReverseAD._classify_linearity(nodes, adj, ReverseAD.Linearity[])
+        @test ret[1] == test_value
+        indexed_set = Coloring.IndexedSet(100)
+        edge_list = ReverseAD._compute_hessian_sparsity(
+            nodes,
+            adj,
+            ret,
+            indexed_set,
+            Set{Tuple{Int,Int}}[],
+            Vector{Int}[],
+        )
+        if ret[1] != ReverseAD.NONLINEAR
+            @test length(edge_list) == 0
+        elseif length(IJ) > 0
+            @test IJ == edge_list
+        end
+        if length(indices) > 0
+            empty!(indexed_set)
+            ReverseAD._compute_gradient_sparsity!(indexed_set, nodes)
+            ix = sort(collect(indexed_set))
+            @test indices == ix
+        end
+        return
+    end
+    _test_linearity(
+        :(sin($x^2) + cos($y * 4) - 2.0),
+        ReverseAD.NONLINEAR,
+        Set([(2, 2), (1, 1)]),
+        [1, 2],
+    )
+    _test_linearity(:(3 * 4 * ($x + $y)), ReverseAD.LINEAR)
+    _test_linearity(
+        :($z * $y),
+        ReverseAD.NONLINEAR,
+        Set([(3, 2), (3, 3), (2, 2)]),
+        [2, 3],
+    )
+    _test_linearity(:(3 + 4), ReverseAD.CONSTANT)
+    _test_linearity(:(sin(3) + $x), ReverseAD.LINEAR)
+    _test_linearity(
+        :(cos($z) * sin(3) + $x),
+        ReverseAD.NONLINEAR,
+        Set([(3, 3)]),
+        [1, 3],
+    )
+    _test_linearity(:($x - 3 * $y), ReverseAD.LINEAR)
+    _test_linearity(:(-$x), ReverseAD.LINEAR)
+    _test_linearity(:(+$x), ReverseAD.LINEAR)
+    _test_linearity(
+        :($x^$y),
+        ReverseAD.NONLINEAR,
+        Set([(2, 2), (1, 1), (2, 1)]),
+        [1, 2],
+    )
+    _test_linearity(:($x / 3 + $y), ReverseAD.LINEAR)
+    _test_linearity(
+        :(3 / ($x * $y)),
+        ReverseAD.NONLINEAR,
+        Set([(2, 2), (1, 1), (2, 1)]),
+        [1, 2],
+    )
+    _test_linearity(:(1 / ($x + 3)), ReverseAD.NONLINEAR, Set([(1, 1)]), [1])
+    _test_linearity(
+        :(ifelse($x <= 1, $x, $y)),
+        ReverseAD.PIECEWISE_LINEAR,
+        Set([]),
+        [],
+    )
+    _test_linearity(
+        :(ifelse($x <= 1, $x^2, $y)),
+        ReverseAD.NONLINEAR,
+        Set([(1, 1)]),
+        [1, 2],
+    )
+    _test_linearity(:(ifelse(1 <= 1, 2, 3)), ReverseAD.CONSTANT)
+    _test_linearity(
+        :(1 / ifelse($x < 1, $x, 0)),
+        ReverseAD.NONLINEAR,
+        Set([(1, 1)]),
+        [1],
+    )
+    return
+end
+
+function test_dual_forward()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    function _test_dual_forward(input, x_input, test_value)
+        data = Nonlinear.NonlinearData()
+        Nonlinear.set_objective(data, input)
+        Nonlinear.set_differentiation_backend(
+            data,
+            Nonlinear.SparseReverseMode(),
+            MOI.VariableIndex[x, y],
+        )
+        MOI.initialize(data, [:Grad])
+        ∇f = fill(NaN, 2)
+        MOI.eval_objective_gradient(data, ∇f, x_input)
+        @test isapprox(∇f, test_value, atol = 1e-10)
+        return
+    end
+    _test_dual_forward(
+        :(sin($x^2) + cos($y * 4) / 5 - 2.0),
+        [1.0, 2.0],
+        [1.0806046117362795, -0.7914865972987055],
+    )
+    _test_dual_forward(
+        :(sin($x^$y) + cos($y * 4) / 5 - 2.0),
+        [1.0, 2.0],
+        [1.0806046117362795, -0.7914865972987055],
+    )
+    _test_dual_forward(
+        :(sin($x^3) + cos($x * $y * 4) / 5 - 2.0),
+        [1.0, 0.0],
+        [1.6209069176044193, 0.0],
+    )
+    _test_dual_forward(
+        :($x * $y),
+        [3.427139283036299e-206, 1.0],
+        [1.0, 3.427139283036299e-206],
+    )
+    return
+end
+
+function test_gradient_registered_function()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    z = MOI.VariableIndex(3)
+    data = Nonlinear.NonlinearData()
+    f(x, y) = (1 / 3) * y^3 - 2x^2
+    function ∇f(g, x, y)
+        g[1] = -4x
+        g[2] = y^2
+        return
+    end
+    Nonlinear.register_operator(data, :Φ, 2, f, ∇f)
+    Nonlinear.register_operator(data, :c, 1, cos, x -> -sin(x), x -> -cos(x))
+    Nonlinear.set_objective(data, :(Φ($y, $x - 1) * c($z)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x, y, z],
+    )
+    MOI.initialize(data, [:Grad])
+    ∇f = fill(NaN, 3)
+    x_input = [2.0, 3.0, 4.0]
+    @test MOI.eval_objective(data, x_input) == f(3.0, 2.0 - 1) * cos(4.0)
+    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0, 4.0])
+    @test ∇f ≈ [
+        cos(x_input[3]) * (x_input[1] - 1)^2,
+        -4cos(x_input[3]) * x_input[2],
+        -sin(x_input[3]) * f(x_input[2], x_input[1] - 1),
+    ]
+    return
+end
+
+function test_gradient_jump_855()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(
+        data,
+        :(ifelse($x <= 3.0, ($x - 2.0)^2, 2 * log($x - 2.0) + 1.0)),
+    )
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    ∇f = fill(NaN, 1)
+    MOI.eval_objective_gradient(data, ∇f, [-1.0])
+    @test ∇f ≈ [-6.0]
+    MOI.eval_objective_gradient(data, ∇f, [2.0])
+    @test ∇f ≈ [0.0]
+    return
+end
+
+function test_gradient_abs()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(abs($x)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    ∇f = fill(NaN, 1)
+    MOI.eval_objective_gradient(data, ∇f, [2.0])
+    @test ∇f ≈ [1.0]
+    MOI.eval_objective_gradient(data, ∇f, [-2.0])
+    @test ∇f ≈ [-1.0]
+    return
+end
+
+function test_gradient_trig()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x, y],
+    )
+    MOI.initialize(data, [:Grad])
+    ∇f = fill(NaN, 2)
+    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0])
+    @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5]
+    return
+end
+
+function test_gradient_logical()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :($x > 0.5 && $x < 0.9))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    @test MOI.eval_objective(data, [1.5]) == 0.0
+    ∇f = fill(NaN, 1)
+    MOI.eval_objective_gradient(data, ∇f, [1.5])
+    @test iszero(∇f)
+    return
+end
+
+function test_gradient_ifelse()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    @test MOI.eval_objective(data, [1.5]) == 1.5
+    ∇f = fill(NaN, 1)
+    MOI.eval_objective_gradient(data, ∇f, [1.5])
+    @test ∇f ≈ [1.0]
+    @test MOI.eval_objective(data, [-0.1]) == -0.1
+    MOI.eval_objective_gradient(data, ∇f, [-0.1])
+    @test ∇f ≈ [1.0]
+    @test MOI.eval_objective(data, [0.2]) == 5
+    MOI.eval_objective_gradient(data, ∇f, [0.2])
+    @test ∇f ≈ [0.0]
+    return
+end
+
+function test_gradient_sqrt_nan()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :(sqrt($x)))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    @test isnan(MOI.eval_objective(data, [-1.5]))
+    ∇f = fill(Inf, 1)
+    MOI.eval_objective_gradient(data, ∇f, [-1.5])
+    @test isnan(∇f[1])
+    return
+end
+
+function test_gradient_variable_power()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    z = MOI.VariableIndex(3)
+    data = Nonlinear.NonlinearData()
+    Nonlinear.set_objective(data, :((1 / $x)^$y - $z))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x, y, z],
+    )
+    MOI.initialize(data, [:Grad])
+    x_input = [2.5, 3.5, 1.0]
+    @test MOI.eval_objective(data, x_input) == (1 / 2.5)^3.5 - 1.0
+    ∇f = fill(Inf, 3)
+    MOI.eval_objective_gradient(data, ∇f, x_input)
+    @test ∇f ≈ [
+        -x_input[2] * x_input[1]^(-x_input[2] - 1),
+        -((1 / x_input[1])^x_input[2]) * log(x_input[1]),
+        -1,
+    ]
+    return
+end
+
+function test_single_parameter()
+    x = MOI.VariableIndex(1)
+    data = Nonlinear.NonlinearData()
+    p = Nonlinear.add_parameter(data, 105.2)
+    Nonlinear.set_objective(data, :($p))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x],
+    )
+    MOI.initialize(data, [:Grad])
+    @test MOI.eval_objective(data, [-0.1]) == 105.2
+    return
+end
+
+function test_gradient_nested_subexpressions()
+    x = MOI.VariableIndex(1)
+    y = MOI.VariableIndex(2)
+    data = Nonlinear.NonlinearData()
+    ex1 = Nonlinear.add_expression(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
+    ex2 = Nonlinear.add_expression(data, :($ex1))
+    Nonlinear.set_objective(data, ex2)
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        MOI.VariableIndex[x, y],
+    )
+    MOI.initialize(data, [:Grad])
+    ∇f = fill(NaN, 2)
+    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0])
+    @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5]
+    return
+end
+
+end  # module
+
+TestReverseAD.runtests()

From e7b4f4b94e4c89f49aa91513c73eef1abd1a4090 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Wed, 20 Apr 2022 12:44:33 +1200
Subject: [PATCH 02/16] More test coverage

---
 test/Nonlinear/ReverseAD.jl | 45 +++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 1ce0346cc3..3b8cc58e08 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -1,7 +1,9 @@
 module TestReverseAD
 
 using Test
+import LinearAlgebra
 import MathOptInterface
+import SparseArrays
 
 const MOI = MathOptInterface
 const Nonlinear = MOI.Nonlinear
@@ -798,6 +800,49 @@ function test_gradient_nested_subexpressions()
     return
 end
 
+function _dense_hessian(hessian_sparsity, V, n)
+    I = [i for (i, _) in hessian_sparsity]
+    J = [j for (_, j) in hessian_sparsity]
+    raw = SparseArrays.sparse(I, J, V, n, n)
+    return Matrix(
+        raw + raw' -
+        SparseArrays.sparse(LinearAlgebra.diagm(0 => LinearAlgebra.diag(raw))),
+    )
+end
+
+# This covers the code that computes Hessians in odd chunks of Hess-vec
+# products.
+function test_odd_chunks_Hessian_products()
+    for i in 1:18
+        _test_odd_chunks_Hessian_products(i)
+    end
+    return
+end
+
+function _test_odd_chunks_Hessian_products(N)
+    data = Nonlinear.NonlinearData()
+    x = MOI.VariableIndex.(1:N)
+    Nonlinear.set_objective(data, Expr(:call, :*, x...))
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        x,
+    )
+    MOI.initialize(data, [:Hess])
+    hessian_sparsity = MOI.hessian_lagrangian_structure(data)
+    V = zeros(length(hessian_sparsity))
+    values = ones(N)
+    MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[])
+    H = _dense_hessian(hessian_sparsity, V, N)
+    @test H ≈ (ones(N, N) - LinearAlgebra.diagm(0 => ones(N)))
+    values[1] = 0.5
+    MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[])
+    H = _dense_hessian(hessian_sparsity, V, N)
+    H_22 = (ones(N - 1, N - 1) - LinearAlgebra.diagm(0 => ones(N - 1))) / 2
+    @test H ≈ [0 ones(N - 1)'; ones(N - 1) H_22]
+    return
+end
+
 end  # module
 
 TestReverseAD.runtests()

From b20e285296aa56888bac4c49a8d623e751228ea6 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Wed, 20 Apr 2022 12:48:04 +1200
Subject: [PATCH 03/16] Add DataStructures

---
 Project.toml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Project.toml b/Project.toml
index 9f86aa8552..9aad182876 100644
--- a/Project.toml
+++ b/Project.toml
@@ -6,6 +6,7 @@ version = "1.2.0"
 BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
 CodecBzip2 = "523fee87-0ab8-5b00-afb7-3ecf72e48cfd"
 CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
+DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
 JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
@@ -22,6 +23,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
 BenchmarkTools = "1"
 CodecBzip2 = "~0.6, 0.7"
 CodecZlib = "~0.6, 0.7"
+DataStructures = "0.18"
 ForwardDiff = "0.5, 0.6, 0.7, 0.8, 0.9, 0.10"
 JSON = "~0.21"
 JSONSchema = "1"

From 020a92f2b16056f9415684a485955bec6f6d8e8b Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Thu, 21 Apr 2022 14:10:24 +1200
Subject: [PATCH 04/16] Fix rebase

---
 test/Nonlinear/ReverseAD.jl | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 3b8cc58e08..1e4366865d 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -451,7 +451,12 @@ function test_NLPBlockData()
     Nonlinear.add_constraint(data, :($x >= 2))
     Nonlinear.add_constraint(data, :($x == 3))
     Nonlinear.add_constraint(data, :(4 <= $x <= 5))
-    block = MOI.NLPBlockData(data, [x])
+    Nonlinear.set_differentiation_backend(
+        data,
+        Nonlinear.SparseReverseMode(),
+        [x],
+    )
+    block = MOI.NLPBlockData(data)
     @test block.constraint_bounds == [
         MOI.NLPBoundsPair(-Inf, 0.0),
         MOI.NLPBoundsPair(0.0, Inf),

From f0ce71f1ddbb20eb8337793b0d2de484813840b1 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Thu, 21 Apr 2022 14:30:49 +1200
Subject: [PATCH 05/16] Fix docs

---
 docs/src/submodules/Nonlinear/overview.md | 54 +++++++++++------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 51e2fa2428..85ee2c4d03 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -302,35 +302,17 @@ However, we cannot call gradient terms such as
 [`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does
 not have the capability to differentiate a nonlinear expression.
 
-Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers,
-solvers query the [`NLPBlock`](@ref) attribute, which returns an
-[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and
-includes other information such as constraint bounds and whether the evaluator
-has a nonlinear objective. Create and set [`NLPBlockData`](@ref) as follows:
+If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access
+to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix
+of the constraints, `:JacVec`, the ability to compute Jacobian-vector products,
+and `:ExprGraph`.
 ```jldoctest nonlinear_developer
-julia> block = MOI.NLPBlockData(evaluator);
-
-julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}());
-
-julia> MOI.set(model, MOI.NLPBlock(), block);
-```
-!!! warning
-    Only call [`NLPBlockData`](@ref) once you have finished modifying the
-    problem in `model`.
-
-If, instead, we set [`Nonlinear.SparseReverseMode`](@ref), then we get access to
-`:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix of
-the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and
-`:ExprGraph`.
-```jldoctest nonlinear_developer
-julia> Nonlinear.set_differentiation_backend(
-           data,
+julia> evaluator = Nonlinear.Evaluator(
+           model,
            Nonlinear.SparseReverseMode(),
            [x],
        )
-
-julia> data
-NonlinearData with available features:
+Nonlinear.Evaluator with available features:
   * :Grad
   * :Jac
   * :JacVec
@@ -339,7 +321,7 @@ NonlinearData with available features:
 
 However, before calling anything, we need to call [`initialize`](@ref):
 ```jldoctest nonlinear_developer
-julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph])
+julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
 ```
 
 Now we can call methods like [`eval_objective`](@ref):
@@ -348,7 +330,7 @@ julia> x = [1.0]
 1-element Vector{Float64}:
  1.0
 
-julia> MOI.eval_objective(data, x)
+julia> MOI.eval_objective(evaluator, x)
 7.268073418273571
 ```
 and [`eval_objective_gradient`](@ref):
@@ -357,13 +339,29 @@ julia> grad = [NaN]
 1-element Vector{Float64}:
  NaN
 
-julia> MOI.eval_objective_gradient(data, grad, x)
+julia> MOI.eval_objective_gradient(evaluator, grad, x)
 
 julia> grad
 1-element Vector{Float64}:
  1.909297426825682
 ```
 
+Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers,
+solvers query the [`NLPBlock`](@ref) attribute, which returns an
+[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and
+includes other information such as constraint bounds and whether the evaluator
+has a nonlinear objective. Create and set [`NLPBlockData`](@ref) as follows:
+```jldoctest nonlinear_developer
+julia> block = MOI.NLPBlockData(evaluator);
+
+julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}());
+
+julia> MOI.set(model, MOI.NLPBlock(), block);
+```
+!!! warning
+    Only call [`NLPBlockData`](@ref) once you have finished modifying the
+    problem in `model`.
+
 ## Expression-graph representation
 
 [`Nonlinear.Model`](@ref) stores nonlinear expressions in

From 27a4696ec8e6940874e3266e4845472ef7311492 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Thu, 21 Apr 2022 14:56:16 +1200
Subject: [PATCH 06/16] More docs

---
 docs/src/submodules/Nonlinear/overview.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 85ee2c4d03..506905e978 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -362,6 +362,28 @@ julia> MOI.set(model, MOI.NLPBlock(), block);
     Only call [`NLPBlockData`](@ref) once you have finished modifying the
     problem in `model`.
 
+Putting everything together, you can create a nonlinear optimization problem in
+MathOptInterface as follows:
+```@example
+import MathOptInterface
+const MOI = MathOptInterface
+
+function build_model(model; backend)
+    x = MOI.add_variable(model)
+    y = MOI.add_variable(model)
+    MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
+    data = MOI.Nonlinear.NonlinearData()
+    MOI.Nonlinear.set_objective(data, :($x^2 + $y^2))
+    MOI.Nonlinear.set_differentiation_backend(data, backend, [x, y])
+    MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(data))
+    return
+end
+
+# Replace `model` and `backend` with your optimizer and backend of choice.
+model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
+build_model(model; backend = MOI.Nonlinear.ExprGraphOnly())
+```
+
 ## Expression-graph representation
 
 [`Nonlinear.Model`](@ref) stores nonlinear expressions in

From 5babfa79f3325aa7003325482499400d58105597 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Thu, 21 Apr 2022 14:56:58 +1200
Subject: [PATCH 07/16] Another tweak to the docs

---
 docs/src/submodules/Nonlinear/overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 506905e978..55302bb469 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -381,7 +381,7 @@ end
 
 # Replace `model` and `backend` with your optimizer and backend of choice.
 model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
-build_model(model; backend = MOI.Nonlinear.ExprGraphOnly())
+build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
 ```
 
 ## Expression-graph representation

From 36c96ab73105208debd720551d025e9a33ad9e94 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Fri, 22 Apr 2022 14:19:12 +1200
Subject: [PATCH 08/16] Updates from nonlinear-api changes

---
 docs/src/submodules/Nonlinear/overview.md |  32 +-
 src/Nonlinear/ReverseAD/types.jl          |   8 +-
 src/Nonlinear/types.jl                    |   7 +-
 test/Nonlinear/ReverseAD.jl               | 527 +++++++++++-----------
 4 files changed, 273 insertions(+), 301 deletions(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 55302bb469..24d699a6ae 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -372,10 +372,10 @@ function build_model(model; backend)
     x = MOI.add_variable(model)
     y = MOI.add_variable(model)
     MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
-    data = MOI.Nonlinear.NonlinearData()
-    MOI.Nonlinear.set_objective(data, :($x^2 + $y^2))
-    MOI.Nonlinear.set_differentiation_backend(data, backend, [x, y])
-    MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(data))
+    nl_model = MOI.Nonlinear.Model()
+    MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
+    evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
+    MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
     return
 end
 
@@ -388,7 +388,7 @@ build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
 
 [`Nonlinear.Model`](@ref) stores nonlinear expressions in
 [`Nonlinear.Expression`](@ref)s. This section explains the design of
-the expression graph datastructure in [`Nonlinear.Expression`](@ref).
+the expression graph data structure in [`Nonlinear.Expression`](@ref).
 
 Given a nonlinear function like `f(x) = sin(x)^2 + x`, a conceptual aid for
 thinking about the graph representation of the expression is to convert it into
@@ -411,7 +411,7 @@ julia> struct ExprNode
 julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);
 ```
 
-This datastructure follows our Polish prefix notation very closely, and we can
+This data structure follows our Polish prefix notation very closely, and we can
 easily identify the arguments to an operator. However, it has a significant
 draw-back: each node in the graph requires a `Vector`, which is heap-allocated
 and tracked by Julia's garbage collector (GC). For large models, we can expect
@@ -447,7 +447,7 @@ challenges:
 
 ### Sketch of the design in Nonlinear
 
-`Nonlinear` overcomes these problems by decomposing the datastructure into a
+`Nonlinear` overcomes these problems by decomposing the data structure into a
 number of different concrete-typed vectors.
 
 First, we create vectors of the supported uni- and multivariate operators.
@@ -510,7 +510,7 @@ julia> expr = Expression(
        );
 ```
 
-This is less readable than the other options, but does this datastructure meet
+This is less readable than the other options, but does this data structure meet
 our design goals?
 
 Instead of a heap-allocated object for each node, we only have two `Vector`s for
@@ -547,9 +547,8 @@ subexpressions in the model.
 
 ## ReverseAD
 
-`Nonlinear.ReverseAD` is a submodule for computing derivatives of the problem
-inside [`Nonlinear.NonlinearData`](@ref) using sparse reverse-mode automatic
-differentiation (AD).
+`Nonlinear.ReverseAD` is a submodule for computing derivatives of a nonlinear
+optimization problem using sparse reverse-mode automatic differentiation (AD).
 
 This section does not attempt to explain how sparse reverse-mode AD works, but
 instead explains why MOI contains it's own implementation, and highlights
@@ -557,8 +556,7 @@ notable differences from similar packages.
 
 !!! warning
     You should not interact with `ReverseAD` directly. Instead, you should
-    create a [`Nonlinear.NonlinearData`](@ref) object, call
-    [`Nonlinear.set_differentiation_backend`](@ref) with
+    create a [`Nonlinear.Evaluator`](@ref) object with
     [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods.
 
 ### Why another AD package?
@@ -582,10 +580,10 @@ Here are four reasons:
    input and then trace an expression graph using operator overloading. This
    means they must deal (or detect and ignore) with control flow, I/O, and other
    vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the
-   form of [`Nonlinear.NonlinearExpression`](@ref), which greatly limits the
-   range of syntax that it must deal with. By reducing the scope of what we
-   accept as input to functions relevant for mathematical optimization, we can
-   provide a simpler implementation with various performance optimizations.
+   form of [`Nonlinear.Expression`](@ref), which greatly limits the range of
+   syntax that it must deal with. By reducing the scope of what we accept as
+   input to functions relevant for mathematical optimization, we can provide a
+   simpler implementation with various performance optimizations.
  * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
    development of which begain in early 2014(!). This was well before the other
    packages started development. Because we had a well-tested, working AD in
diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl
index 348af23859..3c0f1042cb 100644
--- a/src/Nonlinear/ReverseAD/types.jl
+++ b/src/Nonlinear/ReverseAD/types.jl
@@ -11,7 +11,7 @@ mutable struct _SubexpressionStorage
     linearity::Linearity
 
     function _SubexpressionStorage(
-        expr::Nonlinear.NonlinearExpression,
+        expr::Nonlinear.Expression,
         subexpression_linearity,
         moi_index_to_consecutive_index,
     )
@@ -131,7 +131,7 @@ end
 
 """
     NLPEvaluator(
-        data::Nonlinear.NonlinearData,
+        model::Nonlinear.Model,
         ordered_variables::Vector{MOI.VariableIndex},
     )
 
@@ -142,7 +142,7 @@ interface.
     Before using, you must initialize the evaluator using `MOI.initialize`.
 """
 mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator
-    data::Nonlinear.NonlinearData
+    data::Nonlinear.Model
     ordered_variables::Vector{MOI.VariableIndex}
 
     objective::Union{Nothing,_FunctionStorage}
@@ -180,7 +180,7 @@ mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator
     max_chunk::Int # chunk size for which we've allocated storage
 
     function NLPEvaluator(
-        data::Nonlinear.NonlinearData,
+        data::Nonlinear.Model,
         ordered_variables::Vector{MOI.VariableIndex},
     )
         return new(data, ordered_variables)
diff --git a/src/Nonlinear/types.jl b/src/Nonlinear/types.jl
index 24c056a0ef..8a386e3abf 100644
--- a/src/Nonlinear/types.jl
+++ b/src/Nonlinear/types.jl
@@ -273,11 +273,10 @@ features in the MOI nonlinear interface.
 """
 struct SparseReverseMode <: AbstractAutomaticDifferentiation end
 
-function set_differentiation_backend(
-    data::NonlinearData,
+function Evaluator(
+    model::Model,
     ::SparseReverseMode,
     ordered_variables::Vector{MOI.VariableIndex},
 )
-    data.inner = ReverseAD.NLPEvaluator(data, ordered_variables)
-    return
+    return Evaluator(model, ReverseAD.NLPEvaluator(model, ordered_variables))
 end
diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 1e4366865d..96112d3680 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -23,23 +23,26 @@ end
 
 function test_objective_quadratic_univariate()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :($x^2 + 1))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.2]) == 1.2^2 + 1
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :($x^2 + 1))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.2]) == 1.2^2 + 1
     g = [NaN]
-    MOI.eval_objective_gradient(data, g, [1.2])
+    MOI.eval_objective_gradient(evaluator, g, [1.2])
     @test g == [2.4]
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1)]
     H = [NaN]
-    MOI.eval_hessian_lagrangian(data, H, [1.2], 1.5, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, H, [1.2], 1.5, Float64[])
     @test H == 1.5 .* [2.0]
-    MOI.eval_hessian_lagrangian_product(data, H, [1.2], [1.2], 1.5, Float64[])
+    MOI.eval_hessian_lagrangian_product(
+        evaluator,
+        H,
+        [1.2],
+        [1.2],
+        1.5,
+        Float64[],
+    )
     @test H == [1.5 * 2.0 * 1.2]
     return
 end
@@ -47,25 +50,30 @@ end
 function test_objective_quadratic_multivariate()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :($x^2 + $x * $y + $y^2))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :($x^2 + $x * $y + $y^2))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
     g = [NaN, NaN]
-    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3])
     @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3]
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) ==
+          [(1, 1), (2, 2), (2, 1)]
     H = [NaN, NaN, NaN]
-    MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, H, [1.2, 2.3], 1.5, Float64[])
     @test H == 1.5 .* [2.0, 2.0, 1.0]
     v = [0.3, 0.4]
     hv = [NaN, NaN]
-    MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[])
+    MOI.eval_hessian_lagrangian_product(
+        evaluator,
+        hv,
+        [1.2, 2.3],
+        v,
+        1.5,
+        Float64[],
+    )
     @test hv ≈ 1.5 .* [2 1; 1 2] * v
     return
 end
@@ -73,28 +81,33 @@ end
 function test_objective_quadratic_multivariate_subexpressions()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    ex = Nonlinear.add_expression(data, :($x^2))
-    ey = Nonlinear.add_expression(data, :($y^2))
-    exy = Nonlinear.add_expression(data, :($ex + $x * $y))
-    Nonlinear.set_objective(data, :($exy + $ey))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
+    model = Nonlinear.Model()
+    ex = Nonlinear.add_expression(model, :($x^2))
+    ey = Nonlinear.add_expression(model, :($y^2))
+    exy = Nonlinear.add_expression(model, :($ex + $x * $y))
+    Nonlinear.set_objective(model, :($exy + $ey))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2
     g = [NaN, NaN]
-    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3])
     @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3]
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) ==
+          [(1, 1), (2, 2), (2, 1)]
     H = [NaN, NaN, NaN]
-    MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, H, [1.2, 2.3], 1.5, Float64[])
     @test H == 1.5 .* [2.0, 2.0, 1.0]
     v = [0.3, 0.4]
     hv = [NaN, NaN]
-    MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[])
+    MOI.eval_hessian_lagrangian_product(
+        evaluator,
+        hv,
+        [1.2, 2.3],
+        v,
+        1.5,
+        Float64[],
+    )
     @test hv ≈ 1.5 .* [2 1; 1 2] * v
     return
 end
@@ -102,20 +115,17 @@ end
 function test_objective_ifelse_comparison()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(ifelse(1 <= $x <= 2, $x^2, $y^2)))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2
-    @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(ifelse(1 <= $x <= 2, $x^2, $y^2)))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2
+    @test MOI.eval_objective(evaluator, [2.2, 2.3]) == 2.3^2
     g = [NaN, NaN]
-    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3])
     @test g == [2 * 1.2, 0.0]
-    MOI.eval_objective_gradient(data, g, [2.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [2.2, 2.3])
     @test g == [0.0, 2 * 2.3]
     return
 end
@@ -123,83 +133,68 @@ end
 function test_objective_ifelse_logic()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2)))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2
-    @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2)))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2
+    @test MOI.eval_objective(evaluator, [2.2, 2.3]) == 2.3^2
     g = [NaN, NaN]
-    MOI.eval_objective_gradient(data, g, [1.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3])
     @test g == [2 * 1.2, 0.0]
-    MOI.eval_objective_gradient(data, g, [2.2, 2.3])
+    MOI.eval_objective_gradient(evaluator, g, [2.2, 2.3])
     @test g == [0.0, 2 * 2.3]
     return
 end
 
 function test_objective_parameter()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    p = Nonlinear.add_parameter(data, 1.2)
-    Nonlinear.set_objective(data, :($p * $x))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
-    @test MOI.eval_objective(data, [1.3]) == 1.2 * 1.3
+    model = Nonlinear.Model()
+    p = Nonlinear.add_parameter(model, 1.2)
+    Nonlinear.set_objective(model, :($p * $x))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    @test MOI.eval_objective(evaluator, [1.3]) == 1.2 * 1.3
     g = [NaN]
-    MOI.eval_objective_gradient(data, g, [1.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.3])
     @test g == [1.2]
     return
 end
 
 function test_objective_subexpression()
-    data = Nonlinear.NonlinearData()
+    model = Nonlinear.Model()
     x = MOI.VariableIndex(1)
     input = :($x^2 + 1)
-    expr = Nonlinear.add_expression(data, input)
-    expr_2 = Nonlinear.add_expression(data, :($expr^2))
-    Nonlinear.set_objective(data, :($expr_2^2))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x],
-    )
-    MOI.initialize(data, [:Grad])
-    @test MOI.eval_objective(data, [1.3]) == ((1.3^2 + 1)^2)^2
+    expr = Nonlinear.add_expression(model, input)
+    expr_2 = Nonlinear.add_expression(model, :($expr^2))
+    Nonlinear.set_objective(model, :($expr_2^2))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
+    MOI.initialize(evaluator, [:Grad])
+    @test MOI.eval_objective(evaluator, [1.3]) == ((1.3^2 + 1)^2)^2
     g = [NaN]
-    MOI.eval_objective_gradient(data, g, [1.3])
+    MOI.eval_objective_gradient(evaluator, g, [1.3])
     @test g ≈ [2 * (1.3^2 + 1)^2 * (2 * (1.3^2 + 1)) * 2 * 1.3]
     return
 end
 
 function test_constraint_quadratic_univariate()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.add_constraint(data, :($x^2 <= 2.0))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    model = Nonlinear.Model()
+    Nonlinear.add_constraint(model, :($x^2 <= 2.0))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2]
-    MOI.eval_constraint(data, g, x_val)
+    MOI.eval_constraint(evaluator, g, x_val)
     @test g == x_val .^ 2 .- 2
-    @test MOI.jacobian_structure(data) == [(1, 1)]
+    @test MOI.jacobian_structure(evaluator) == [(1, 1)]
     J = [NaN]
-    MOI.eval_constraint_jacobian(data, J, x_val)
+    MOI.eval_constraint_jacobian(evaluator, J, x_val)
     @test J == 2 .* x_val
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1)]
     H = [NaN]
-    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5])
     @test H == 1.5 .* [2.0]
     return
 end
@@ -207,25 +202,23 @@ end
 function test_constraint_quadratic_multivariate()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.add_constraint(data, :($x^2 + $x * $y + $y^2 <= 2.0))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    model = Nonlinear.Model()
+    Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2 <= 2.0))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2, 2.3]
-    MOI.eval_constraint(data, g, x_val)
+    MOI.eval_constraint(evaluator, g, x_val)
     @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
-    @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)]
+    @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)]
     J = [NaN, NaN]
-    MOI.eval_constraint_jacobian(data, J, x_val)
+    MOI.eval_constraint_jacobian(evaluator, J, x_val)
     @test J == [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]]
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) ==
+          [(1, 1), (2, 2), (2, 1)]
     H = [NaN, NaN, NaN]
-    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5])
     @test H == 1.5 .* [2.0, 2.0, 1.0]
     return
 end
@@ -233,39 +226,37 @@ end
 function test_constraint_quadratic_multivariate_subexpressions()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    ex = Nonlinear.add_expression(data, :($x^2))
-    ey = Nonlinear.add_expression(data, :($y^2))
-    exy = Nonlinear.add_expression(data, :($ex + $x * $y))
-    Nonlinear.add_constraint(data, :($exy + $ey <= 2.0))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    model = Nonlinear.Model()
+    ex = Nonlinear.add_expression(model, :($x^2))
+    ey = Nonlinear.add_expression(model, :($y^2))
+    exy = Nonlinear.add_expression(model, :($ex + $x * $y))
+    Nonlinear.add_constraint(model, :($exy + $ey <= 2.0))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2, 2.3]
-    MOI.eval_constraint(data, g, x_val)
+    MOI.eval_constraint(evaluator, g, x_val)
     @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
     # Jacobian
-    @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)]
+    @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)]
     J = [NaN, NaN]
-    MOI.eval_constraint_jacobian(data, J, x_val)
+    MOI.eval_constraint_jacobian(evaluator, J, x_val)
     @test J ≈ [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]]
     # Jv
     y, w = [NaN], [1.1, 2.2]
-    MOI.eval_constraint_jacobian_product(data, y, x_val, w)
+    MOI.eval_constraint_jacobian_product(evaluator, y, x_val, w)
     @test y ≈ [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] * w
     # v'J
     y, w = [NaN, NaN], [1.1]
-    MOI.eval_constraint_jacobian_transpose_product(data, y, x_val, w)
+    MOI.eval_constraint_jacobian_transpose_product(evaluator, y, x_val, w)
     wJ = w' * [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])]
     @test y ≈ wJ[:]
     # Hessian-lagrangian
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    @test MOI.hessian_lagrangian_structure(evaluator) ==
+          [(1, 1), (2, 2), (2, 1)]
     H = [NaN, NaN, NaN]
-    MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5])
+    MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5])
     @test H ≈ 1.5 .* [2.0, 2.0, 1.0]
     return
 end
@@ -285,20 +276,17 @@ function test_hessian_sparsity_registered_function()
         H[2, 2] = 2
         return
     end
-    data = Nonlinear.NonlinearData()
-    Nonlinear.register_operator(data, :f, 2, f, ∇f, ∇²f)
-    Nonlinear.set_objective(data, :(f($x, $z) + $y^2))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y, z],
-    )
-    @test_broken :Hess in MOI.features_available(data)
-    # MOI.initialize(data, [:Grad, :Jac, :Hess])
-    # @test MOI.hessian_lagrangian_structure(data) ==
+    model = Nonlinear.Model()
+    Nonlinear.register_operator(model, :f, 2, f, ∇f, ∇²f)
+    Nonlinear.set_objective(model, :(f($x, $z) + $y^2))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y, z])
+    @test_broken :Hess in MOI.features_available(evaluator)
+    # MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    # @test MOI.hessian_lagrangian_structure(evaluator) ==
     #       [(1, 1), (2, 2), (3, 3), (3, 1)]
     # H = fill(NaN, 4)
-    # MOI.eval_hessian_lagrangian(data, H, rand(3), 1.5, Float64[])
+    # MOI.eval_hessian_lagrangian(evaluator, H, rand(3), 1.5, Float64[])
     # @test H == 1.5 .* [2.0, 2.0, 2.0, 0.0]
     return
 end
@@ -318,19 +306,16 @@ function test_hessian_sparsity_registered_rosenbrock()
         H[2, 2] = 200.0
         return
     end
-    data = Nonlinear.NonlinearData()
-    Nonlinear.register_operator(data, :rosenbrock, 2, f, ∇f, ∇²f)
-    Nonlinear.set_objective(data, :(rosenbrock($x, $y)))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x, y],
-    )
-    @test_broken :Hess in MOI.features_available(data)
-    # MOI.initialize(data, [:Grad, :Jac, :Hess])
-    # @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)]
+    model = Nonlinear.Model()
+    Nonlinear.register_operator(model, :rosenbrock, 2, f, ∇f, ∇²f)
+    Nonlinear.set_objective(model, :(rosenbrock($x, $y)))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
+    @test_broken :Hess in MOI.features_available(evaluator)
+    # MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
+    # @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2), (2, 1)]
     # H = fill(NaN, 3)
-    # MOI.eval_hessian_lagrangian(data, H, [1.0, 1.0], 1.5, Float64[])
+    # MOI.eval_hessian_lagrangian(evaluator, H, [1.0, 1.0], 1.5, Float64[])
     # @test H == 1.5 .* [802, 200, -400]
     return
 end
@@ -420,22 +405,20 @@ end
 function test_derivatives()
     a = MOI.VariableIndex(1)
     b = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(sin($a^2) + cos($b * 4) / 5 - 2.0))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [a, b],
-    )
-    MOI.initialize(data, [:Grad, :Jac, :Hess])
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(sin($a^2) + cos($b * 4) / 5 - 2.0))
+    evaluator =
+        Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [a, b])
+    MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     x = [2.0, 3.0]
-    @test MOI.eval_objective(data, x) == sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0
+    @test MOI.eval_objective(evaluator, x) ==
+          sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0
     g = [NaN, NaN]
-    MOI.eval_objective_gradient(data, g, x)
+    MOI.eval_objective_gradient(evaluator, g, x)
     @test g ≈ [2 * x[1] * cos(x[1]^2), -4 * sin(x[2] * 4) / 5]
-    @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2)]
+    @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2)]
     H = [NaN, NaN]
-    MOI.eval_hessian_lagrangian(data, H, x, 1.5, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, H, x, 1.5, Float64[])
     H_exact = [
         -4 * x[1]^2 * sin(x[1]^2) + 2 * cos(x[1]^2),
         -4 / 5 * 4 * cos(x[2] * 4),
@@ -445,18 +428,14 @@ function test_derivatives()
 end
 
 function test_NLPBlockData()
-    data = Nonlinear.NonlinearData()
+    model = Nonlinear.Model()
     x = MOI.VariableIndex(1)
-    Nonlinear.add_constraint(data, :($x <= 1))
-    Nonlinear.add_constraint(data, :($x >= 2))
-    Nonlinear.add_constraint(data, :($x == 3))
-    Nonlinear.add_constraint(data, :(4 <= $x <= 5))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        [x],
-    )
-    block = MOI.NLPBlockData(data)
+    Nonlinear.add_constraint(model, :($x <= 1))
+    Nonlinear.add_constraint(model, :($x >= 2))
+    Nonlinear.add_constraint(model, :($x == 3))
+    Nonlinear.add_constraint(model, :(4 <= $x <= 5))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
+    block = MOI.NLPBlockData(evaluator)
     @test block.constraint_bounds == [
         MOI.NLPBoundsPair(-Inf, 0.0),
         MOI.NLPBoundsPair(0.0, Inf),
@@ -473,9 +452,9 @@ function test_linearity()
     z = MOI.VariableIndex(3)
     variables = Dict(x => 1, y => 2, z => 3)
     function _test_linearity(input, test_value, IJ = [], indices = [])
-        data = Nonlinear.NonlinearData()
-        ex = Nonlinear.add_expression(data, input)
-        expr = data[ex]
+        model = Nonlinear.Model()
+        ex = Nonlinear.add_expression(model, input)
+        expr = model[ex]
         adj = Nonlinear.adjacency_matrix(expr.nodes)
         nodes = ReverseAD._replace_moi_variables(expr.nodes, variables)
         ret = ReverseAD._classify_linearity(nodes, adj, ReverseAD.Linearity[])
@@ -566,16 +545,16 @@ function test_dual_forward()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
     function _test_dual_forward(input, x_input, test_value)
-        data = Nonlinear.NonlinearData()
-        Nonlinear.set_objective(data, input)
-        Nonlinear.set_differentiation_backend(
-            data,
+        model = Nonlinear.Model()
+        Nonlinear.set_objective(model, input)
+        evaluator = Nonlinear.Evaluator(
+            model,
             Nonlinear.SparseReverseMode(),
             MOI.VariableIndex[x, y],
         )
-        MOI.initialize(data, [:Grad])
+        MOI.initialize(evaluator, [:Grad])
         ∇f = fill(NaN, 2)
-        MOI.eval_objective_gradient(data, ∇f, x_input)
+        MOI.eval_objective_gradient(evaluator, ∇f, x_input)
         @test isapprox(∇f, test_value, atol = 1e-10)
         return
     end
@@ -606,26 +585,26 @@ function test_gradient_registered_function()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
     z = MOI.VariableIndex(3)
-    data = Nonlinear.NonlinearData()
+    model = Nonlinear.Model()
     f(x, y) = (1 / 3) * y^3 - 2x^2
     function ∇f(g, x, y)
         g[1] = -4x
         g[2] = y^2
         return
     end
-    Nonlinear.register_operator(data, :Φ, 2, f, ∇f)
-    Nonlinear.register_operator(data, :c, 1, cos, x -> -sin(x), x -> -cos(x))
-    Nonlinear.set_objective(data, :(Φ($y, $x - 1) * c($z)))
-    Nonlinear.set_differentiation_backend(
-        data,
+    Nonlinear.register_operator(model, :Φ, 2, f, ∇f)
+    Nonlinear.register_operator(model, :c, 1, cos, x -> -sin(x), x -> -cos(x))
+    Nonlinear.set_objective(model, :(Φ($y, $x - 1) * c($z)))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x, y, z],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     ∇f = fill(NaN, 3)
     x_input = [2.0, 3.0, 4.0]
-    @test MOI.eval_objective(data, x_input) == f(3.0, 2.0 - 1) * cos(4.0)
-    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0, 4.0])
+    @test MOI.eval_objective(evaluator, x_input) == f(3.0, 2.0 - 1) * cos(4.0)
+    MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0, 4.0])
     @test ∇f ≈ [
         cos(x_input[3]) * (x_input[1] - 1)^2,
         -4cos(x_input[3]) * x_input[2],
@@ -636,39 +615,39 @@ end
 
 function test_gradient_jump_855()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
+    model = Nonlinear.Model()
     Nonlinear.set_objective(
-        data,
+        model,
         :(ifelse($x <= 3.0, ($x - 2.0)^2, 2 * log($x - 2.0) + 1.0)),
     )
-    Nonlinear.set_differentiation_backend(
-        data,
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     ∇f = fill(NaN, 1)
-    MOI.eval_objective_gradient(data, ∇f, [-1.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [-1.0])
     @test ∇f ≈ [-6.0]
-    MOI.eval_objective_gradient(data, ∇f, [2.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [2.0])
     @test ∇f ≈ [0.0]
     return
 end
 
 function test_gradient_abs()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(abs($x)))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(abs($x)))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     ∇f = fill(NaN, 1)
-    MOI.eval_objective_gradient(data, ∇f, [2.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [2.0])
     @test ∇f ≈ [1.0]
-    MOI.eval_objective_gradient(data, ∇f, [-2.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [-2.0])
     @test ∇f ≈ [-1.0]
     return
 end
@@ -676,73 +655,73 @@ end
 function test_gradient_trig()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x, y],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     ∇f = fill(NaN, 2)
-    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0])
     @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5]
     return
 end
 
 function test_gradient_logical()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :($x > 0.5 && $x < 0.9))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :($x > 0.5 && $x < 0.9))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
-    @test MOI.eval_objective(data, [1.5]) == 0.0
+    MOI.initialize(evaluator, [:Grad])
+    @test MOI.eval_objective(evaluator, [1.5]) == 0.0
     ∇f = fill(NaN, 1)
-    MOI.eval_objective_gradient(data, ∇f, [1.5])
+    MOI.eval_objective_gradient(evaluator, ∇f, [1.5])
     @test iszero(∇f)
     return
 end
 
 function test_gradient_ifelse()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5)))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5)))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
-    @test MOI.eval_objective(data, [1.5]) == 1.5
+    MOI.initialize(evaluator, [:Grad])
+    @test MOI.eval_objective(evaluator, [1.5]) == 1.5
     ∇f = fill(NaN, 1)
-    MOI.eval_objective_gradient(data, ∇f, [1.5])
+    MOI.eval_objective_gradient(evaluator, ∇f, [1.5])
     @test ∇f ≈ [1.0]
-    @test MOI.eval_objective(data, [-0.1]) == -0.1
-    MOI.eval_objective_gradient(data, ∇f, [-0.1])
+    @test MOI.eval_objective(evaluator, [-0.1]) == -0.1
+    MOI.eval_objective_gradient(evaluator, ∇f, [-0.1])
     @test ∇f ≈ [1.0]
-    @test MOI.eval_objective(data, [0.2]) == 5
-    MOI.eval_objective_gradient(data, ∇f, [0.2])
+    @test MOI.eval_objective(evaluator, [0.2]) == 5
+    MOI.eval_objective_gradient(evaluator, ∇f, [0.2])
     @test ∇f ≈ [0.0]
     return
 end
 
 function test_gradient_sqrt_nan()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :(sqrt($x)))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :(sqrt($x)))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
-    @test isnan(MOI.eval_objective(data, [-1.5]))
+    MOI.initialize(evaluator, [:Grad])
+    @test isnan(MOI.eval_objective(evaluator, [-1.5]))
     ∇f = fill(Inf, 1)
-    MOI.eval_objective_gradient(data, ∇f, [-1.5])
+    MOI.eval_objective_gradient(evaluator, ∇f, [-1.5])
     @test isnan(∇f[1])
     return
 end
@@ -751,18 +730,18 @@ function test_gradient_variable_power()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
     z = MOI.VariableIndex(3)
-    data = Nonlinear.NonlinearData()
-    Nonlinear.set_objective(data, :((1 / $x)^$y - $z))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    Nonlinear.set_objective(model, :((1 / $x)^$y - $z))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x, y, z],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     x_input = [2.5, 3.5, 1.0]
-    @test MOI.eval_objective(data, x_input) == (1 / 2.5)^3.5 - 1.0
+    @test MOI.eval_objective(evaluator, x_input) == (1 / 2.5)^3.5 - 1.0
     ∇f = fill(Inf, 3)
-    MOI.eval_objective_gradient(data, ∇f, x_input)
+    MOI.eval_objective_gradient(evaluator, ∇f, x_input)
     @test ∇f ≈ [
         -x_input[2] * x_input[1]^(-x_input[2] - 1),
         -((1 / x_input[1])^x_input[2]) * log(x_input[1]),
@@ -773,34 +752,34 @@ end
 
 function test_single_parameter()
     x = MOI.VariableIndex(1)
-    data = Nonlinear.NonlinearData()
-    p = Nonlinear.add_parameter(data, 105.2)
-    Nonlinear.set_objective(data, :($p))
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    p = Nonlinear.add_parameter(model, 105.2)
+    Nonlinear.set_objective(model, :($p))
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x],
     )
-    MOI.initialize(data, [:Grad])
-    @test MOI.eval_objective(data, [-0.1]) == 105.2
+    MOI.initialize(evaluator, [:Grad])
+    @test MOI.eval_objective(evaluator, [-0.1]) == 105.2
     return
 end
 
 function test_gradient_nested_subexpressions()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
-    data = Nonlinear.NonlinearData()
-    ex1 = Nonlinear.add_expression(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
-    ex2 = Nonlinear.add_expression(data, :($ex1))
-    Nonlinear.set_objective(data, ex2)
-    Nonlinear.set_differentiation_backend(
-        data,
+    model = Nonlinear.Model()
+    ex1 = Nonlinear.add_expression(model, :(sin($x^2) + cos($y * 4) / 5 - 2.0))
+    ex2 = Nonlinear.add_expression(model, :($ex1))
+    Nonlinear.set_objective(model, ex2)
+    evaluator = Nonlinear.Evaluator(
+        model,
         Nonlinear.SparseReverseMode(),
         MOI.VariableIndex[x, y],
     )
-    MOI.initialize(data, [:Grad])
+    MOI.initialize(evaluator, [:Grad])
     ∇f = fill(NaN, 2)
-    MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0])
+    MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0])
     @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5]
     return
 end
@@ -825,23 +804,19 @@ function test_odd_chunks_Hessian_products()
 end
 
 function _test_odd_chunks_Hessian_products(N)
-    data = Nonlinear.NonlinearData()
+    model = Nonlinear.Model()
     x = MOI.VariableIndex.(1:N)
-    Nonlinear.set_objective(data, Expr(:call, :*, x...))
-    Nonlinear.set_differentiation_backend(
-        data,
-        Nonlinear.SparseReverseMode(),
-        x,
-    )
-    MOI.initialize(data, [:Hess])
-    hessian_sparsity = MOI.hessian_lagrangian_structure(data)
+    Nonlinear.set_objective(model, Expr(:call, :*, x...))
+    evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), x)
+    MOI.initialize(evaluator, [:Hess])
+    hessian_sparsity = MOI.hessian_lagrangian_structure(evaluator)
     V = zeros(length(hessian_sparsity))
     values = ones(N)
-    MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, V, values, 1.0, Float64[])
     H = _dense_hessian(hessian_sparsity, V, N)
     @test H ≈ (ones(N, N) - LinearAlgebra.diagm(0 => ones(N)))
     values[1] = 0.5
-    MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[])
+    MOI.eval_hessian_lagrangian(evaluator, V, values, 1.0, Float64[])
     H = _dense_hessian(hessian_sparsity, V, N)
     H_22 = (ones(N - 1, N - 1) - LinearAlgebra.diagm(0 => ones(N - 1))) / 2
     @test H ≈ [0 ones(N - 1)'; ones(N - 1) H_22]

From aa3d326db2c160f94cb237f0b3c07d846917ac9d Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Mon, 25 Apr 2022 12:17:26 +1200
Subject: [PATCH 09/16] Add license header

---
 src/Nonlinear/ReverseAD/Coloring/Coloring.jl    | 6 ++++++
 src/Nonlinear/ReverseAD/ReverseAD.jl            | 6 ++++++
 src/Nonlinear/ReverseAD/forward_over_reverse.jl | 6 ++++++
 src/Nonlinear/ReverseAD/graph_tools.jl          | 6 ++++++
 src/Nonlinear/ReverseAD/mathoptinterface_api.jl | 6 ++++++
 src/Nonlinear/ReverseAD/precompile.jl           | 6 ++++++
 src/Nonlinear/ReverseAD/reverse_mode.jl         | 6 ++++++
 src/Nonlinear/ReverseAD/types.jl                | 6 ++++++
 src/Nonlinear/ReverseAD/utils.jl                | 6 ++++++
 test/Nonlinear/ReverseAD.jl                     | 6 ++++++
 10 files changed, 60 insertions(+)

diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
index 6a2b5f3eb5..8a4ee89e3c 100644
--- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
+++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 module Coloring
 
 import DataStructures
diff --git a/src/Nonlinear/ReverseAD/ReverseAD.jl b/src/Nonlinear/ReverseAD/ReverseAD.jl
index 5d865466d0..34bd00bd80 100644
--- a/src/Nonlinear/ReverseAD/ReverseAD.jl
+++ b/src/Nonlinear/ReverseAD/ReverseAD.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 module ReverseAD
 
 import ForwardDiff
diff --git a/src/Nonlinear/ReverseAD/forward_over_reverse.jl b/src/Nonlinear/ReverseAD/forward_over_reverse.jl
index a272cba32d..a9d07f8ef5 100644
--- a/src/Nonlinear/ReverseAD/forward_over_reverse.jl
+++ b/src/Nonlinear/ReverseAD/forward_over_reverse.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 const TAG = :ReverseAD
 
 """
diff --git a/src/Nonlinear/ReverseAD/graph_tools.jl b/src/Nonlinear/ReverseAD/graph_tools.jl
index b3c06f9ecf..80d8fdc555 100644
--- a/src/Nonlinear/ReverseAD/graph_tools.jl
+++ b/src/Nonlinear/ReverseAD/graph_tools.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 """
     _replace_moi_variables(
         nodes::Vector{Nonlinear.Node},
diff --git a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl
index 7adee13d3f..e9739f773c 100644
--- a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl
+++ b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 function MOI.features_available(d::NLPEvaluator)
     # Check if we have any user-defined multivariate operators, in which case we
     # need to disable hessians. The result of features_available depends on this.
diff --git a/src/Nonlinear/ReverseAD/precompile.jl b/src/Nonlinear/ReverseAD/precompile.jl
index 16915364f8..b987ea42e8 100644
--- a/src/Nonlinear/ReverseAD/precompile.jl
+++ b/src/Nonlinear/ReverseAD/precompile.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 function _precompile_()
     ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
     Base.precompile(
diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl
index 4c09c8b11e..a6758aa05c 100644
--- a/src/Nonlinear/ReverseAD/reverse_mode.jl
+++ b/src/Nonlinear/ReverseAD/reverse_mode.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 """
     _reverse_mode(d::NLPEvaluator, x)
 
diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl
index 3c0f1042cb..6f2163708f 100644
--- a/src/Nonlinear/ReverseAD/types.jl
+++ b/src/Nonlinear/ReverseAD/types.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 mutable struct _SubexpressionStorage
     nodes::Vector{Nonlinear.Node}
     adj::SparseArrays.SparseMatrixCSC{Bool,Int}
diff --git a/src/Nonlinear/ReverseAD/utils.jl b/src/Nonlinear/ReverseAD/utils.jl
index ca3ae6432c..e67f7f1346 100644
--- a/src/Nonlinear/ReverseAD/utils.jl
+++ b/src/Nonlinear/ReverseAD/utils.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 """
     _UnsafeVectorView(offset, len, ptr)
 
diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 96112d3680..556fea7cc3 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -1,3 +1,9 @@
+# Copyright (c) 2017: Miles Lubin and contributors
+# Copyright (c) 2017: Google Inc.
+#
+# Use of this source code is governed by an MIT-style license that can be found
+# in the LICENSE.md file or at https://opensource.org/licenses/MIT.
+
 module TestReverseAD
 
 using Test

From f125e164523ad2ea492a24ac427492601ce411f7 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Tue, 26 Apr 2022 06:17:53 +1200
Subject: [PATCH 10/16] Update for latest changes

---
 test/Nonlinear/ReverseAD.jl | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 556fea7cc3..52ddea52be 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -187,13 +187,13 @@ end
 function test_constraint_quadratic_univariate()
     x = MOI.VariableIndex(1)
     model = Nonlinear.Model()
-    Nonlinear.add_constraint(model, :($x^2 <= 2.0))
+    Nonlinear.add_constraint(model, :($x^2), MOI.LessThan(2.0))
     evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
     MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2]
     MOI.eval_constraint(evaluator, g, x_val)
-    @test g == x_val .^ 2 .- 2
+    @test g == x_val .^ 2
     @test MOI.jacobian_structure(evaluator) == [(1, 1)]
     J = [NaN]
     MOI.eval_constraint_jacobian(evaluator, J, x_val)
@@ -209,14 +209,14 @@ function test_constraint_quadratic_multivariate()
     x = MOI.VariableIndex(1)
     y = MOI.VariableIndex(2)
     model = Nonlinear.Model()
-    Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2 <= 2.0))
+    Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2), MOI.LessThan(2.0))
     evaluator =
         Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
     MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2, 2.3]
     MOI.eval_constraint(evaluator, g, x_val)
-    @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
+    @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2]
     @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)]
     J = [NaN, NaN]
     MOI.eval_constraint_jacobian(evaluator, J, x_val)
@@ -236,14 +236,14 @@ function test_constraint_quadratic_multivariate_subexpressions()
     ex = Nonlinear.add_expression(model, :($x^2))
     ey = Nonlinear.add_expression(model, :($y^2))
     exy = Nonlinear.add_expression(model, :($ex + $x * $y))
-    Nonlinear.add_constraint(model, :($exy + $ey <= 2.0))
+    Nonlinear.add_constraint(model, :($exy + $ey), MOI.LessThan(2.0))
     evaluator =
         Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
     MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     g = [NaN]
     x_val = [1.2, 2.3]
     MOI.eval_constraint(evaluator, g, x_val)
-    @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2
+    @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2]
     # Jacobian
     @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)]
     J = [NaN, NaN]
@@ -436,10 +436,10 @@ end
 function test_NLPBlockData()
     model = Nonlinear.Model()
     x = MOI.VariableIndex(1)
-    Nonlinear.add_constraint(model, :($x <= 1))
-    Nonlinear.add_constraint(model, :($x >= 2))
-    Nonlinear.add_constraint(model, :($x == 3))
-    Nonlinear.add_constraint(model, :(4 <= $x <= 5))
+    Nonlinear.add_constraint(model, :($x - 1), MOI.LessThan(0.0))
+    Nonlinear.add_constraint(model, :($x - 2), MOI.GreaterThan(0.0))
+    Nonlinear.add_constraint(model, :($x - 3), MOI.EqualTo(0.0))
+    Nonlinear.add_constraint(model, :($x), MOI.Interval(4.0, 5.0))
     evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x])
     block = MOI.NLPBlockData(evaluator)
     @test block.constraint_bounds == [

From db384d94b5c84d47c7af3a63003daa484317998e Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Tue, 3 May 2022 08:56:11 +1200
Subject: [PATCH 11/16] Re-add function evaluations

---
 src/Nonlinear/ReverseAD/Coloring/Coloring.jl |  12 +-
 src/Nonlinear/ReverseAD/reverse_mode.jl      | 135 ++++++++++++++++---
 2 files changed, 122 insertions(+), 25 deletions(-)

diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
index 8a4ee89e3c..9a3e295c93 100644
--- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
+++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
@@ -94,7 +94,7 @@ function UndirectedGraph(I, J, nel)
         offsets[k+1] = offsets[k] + adjcount[k]
     end
     fill!(adjcount, 0)
-    edges = Array{Tuple{Int,Int}}(undef, n_edges)
+    edges = Vector{Tuple{Int,Int}}(undef, n_edges)
     adjlist = Vector{Int}(undef, offsets[nel+1] - 1)
     edgeindex = Vector{Int}(undef, length(adjlist))
     edge_count = 0
@@ -321,7 +321,7 @@ function recovery_preprocess(
         edge_count[idx] += 1
     end
     # edges sorted by twocolor subgraph
-    sorted_edges = Array{Vector{Tuple{Int,Int}}}(undef, seen_twocolors)
+    sorted_edges = Vector{Vector{Tuple{Int,Int}}}(undef, seen_twocolors)
     for idx in 1:seen_twocolors
         sorted_edges[idx] = Tuple{Int,Int}[]
         sizehint!(sorted_edges[idx], edge_count[idx])
@@ -334,9 +334,9 @@ function recovery_preprocess(
         push!(sorted_edges[idx], (u, v))
     end
     # list of unique vertices in each twocolor subgraph
-    vertexmap = Array{Vector{Int}}(undef, seen_twocolors)
-    postorder = Array{Vector{Int}}(undef, seen_twocolors)
-    parents = Array{Vector{Int}}(undef, seen_twocolors)
+    vertexmap = Vector{Vector{Int}}(undef, seen_twocolors)
+    postorder = Vector{Vector{Int}}(undef, seen_twocolors)
+    parents = Vector{Vector{Int}}(undef, seen_twocolors)
     # temporary lookup map from global index to subgraph index
     revmap = zeros(Int, _num_vertices(g))
     adjcount = zeros(Int, _num_vertices(g))
@@ -485,7 +485,7 @@ end
 Allocate a seed matrix for the Coloring.
 """
 function seed_matrix(rinfo::RecoveryInfo)
-    return Array{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors)
+    return Matrix{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors)
 end
 
 """
diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl
index a6758aa05c..136e04232c 100644
--- a/src/Nonlinear/ReverseAD/reverse_mode.jl
+++ b/src/Nonlinear/ReverseAD/reverse_mode.jl
@@ -108,25 +108,122 @@ function _forward_eval(
         elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE
             children_indices = SparseArrays.nzrange(f.adj, k)
             N = length(children_indices)
-            f_input = _UnsafeVectorView(d.jac_storage, N)
-            ∇f = _UnsafeVectorView(d.user_output_buffer, N)
-            for (r, i) in enumerate(children_indices)
-                f_input[r] = f.forward_storage[children_arr[i]]
-                ∇f[r] = 0.0
-            end
-            f.forward_storage[k] = Nonlinear.eval_multivariate_function(
-                operators,
-                operators.multivariate_operators[node.index],
-                f_input,
-            )
-            Nonlinear.eval_multivariate_gradient(
-                operators,
-                operators.multivariate_operators[node.index],
-                ∇f,
-                f_input,
-            )
-            for (r, i) in enumerate(children_indices)
-                f.partials_storage[children_arr[i]] = ∇f[r]
+            # TODO(odow);
+            # With appropriate benchmarking, the special-cased if-statements can
+            # be removed in favor of the generic user-defined function case.
+            if node.index == 1 # :+
+                tmp_sum = zero(T)
+                for c_idx in children_indices
+                    @inbounds ix = children_arr[c_idx]
+                    @inbounds f.partials_storage[ix] = one(T)
+                    @inbounds tmp_sum += f.forward_storage[ix]
+                end
+                f.forward_storage[k] = tmp_sum
+            elseif node.index == 2 # :-
+                @assert N == 2
+                child1 = first(children_indices)
+                @inbounds ix1 = children_arr[child1]
+                @inbounds ix2 = children_arr[child1+1]
+                @inbounds tmp_sub = f.forward_storage[ix1]
+                @inbounds tmp_sub -= f.forward_storage[ix2]
+                @inbounds f.partials_storage[ix1] = one(T)
+                @inbounds f.partials_storage[ix2] = -one(T)
+                f.forward_storage[k] = tmp_sub
+            elseif node.index == 3 # :*
+                tmp_prod = one(T)
+                for c_idx in children_indices
+                    @inbounds tmp_prod *= f.forward_storage[children_arr[c_idx]]
+                end
+                if tmp_prod == zero(T) || N <= 2
+                    # This is inefficient if there are a lot of children.
+                    # 2 is chosen as a limit because (x*y)/y does not always
+                    # equal x for floating-point numbers. This can produce
+                    # unexpected error in partials. There's still an error when
+                    # multiplying three or more terms, but users are less likely
+                    # to complain about it.
+                    for c_idx in children_indices
+                        prod_others = one(T)
+                        for c_idx2 in children_indices
+                            (c_idx == c_idx2) && continue
+                            ix = children_arr[c_idx2]
+                            prod_others *= f.forward_storage[ix]
+                        end
+                        f.partials_storage[children_arr[c_idx]] = prod_others
+                    end
+                else
+                    # Compute all-minus-one partial derivatives by dividing from
+                    # the total product.
+                    for c_idx in children_indices
+                        ix = children_arr[c_idx]
+                        f.partials_storage[ix] =
+                            tmp_prod / f.forward_storage[ix]
+                    end
+                end
+                @inbounds f.forward_storage[k] = tmp_prod
+            elseif node.index == 4 # :^
+                @assert N == 2
+                idx1 = first(children_indices)
+                idx2 = last(children_indices)
+                @inbounds ix1 = children_arr[idx1]
+                @inbounds ix2 = children_arr[idx2]
+                @inbounds base = f.forward_storage[ix1]
+                @inbounds exponent = f.forward_storage[ix2]
+                if exponent == 2
+                    @inbounds f.forward_storage[k] = base * base
+                    @inbounds f.partials_storage[ix1] = 2 * base
+                elseif exponent == 1
+                    @inbounds f.forward_storage[k] = base
+                    @inbounds f.partials_storage[ix1] = 1.0
+                else
+                    f.forward_storage[k] = pow(base, exponent)
+                    f.partials_storage[ix1] = exponent * pow(base, exponent - 1)
+                end
+                f.partials_storage[ix2] = f.forward_storage[k] * log(base)
+            elseif node.index == 5 # :/
+                @assert N == 2
+                idx1 = first(children_indices)
+                idx2 = last(children_indices)
+                @inbounds ix1 = children_arr[idx1]
+                @inbounds ix2 = children_arr[idx2]
+                @inbounds numerator = f.forward_storage[ix1]
+                @inbounds denominator = f.forward_storage[ix2]
+                recip_denominator = 1 / denominator
+                @inbounds f.partials_storage[ix1] = recip_denominator
+                f.partials_storage[ix2] =
+                    -numerator * recip_denominator * recip_denominator
+                f.forward_storage[k] = numerator * recip_denominator
+            elseif node.index == 6 # ifelse
+                @assert N == 3
+                idx1 = first(children_indices)
+                @inbounds condition = f.forward_storage[children_arr[idx1]]
+                @inbounds lhs = f.forward_storage[children_arr[idx1+1]]
+                @inbounds rhs = f.forward_storage[children_arr[idx1+2]]
+                @inbounds f.partials_storage[children_arr[idx1+1]] =
+                    condition == 1
+                @inbounds f.partials_storage[children_arr[idx1+2]] =
+                    !(condition == 1)
+                f.forward_storage[k] = ifelse(condition == 1, lhs, rhs)
+            else
+                f_input = _UnsafeVectorView(d.jac_storage, N)
+                ∇f = _UnsafeVectorView(d.user_output_buffer, N)
+                for (r, i) in enumerate(children_indices)
+                    f_input[r] = f.forward_storage[children_arr[i]]
+                    ∇f[r] = 0.0
+                end
+                f.forward_storage[k] = Nonlinear.eval_multivariate_function(
+                    operators,
+                    operators.multivariate_operators[node.index],
+                    f_input,
+                )
+                Nonlinear.eval_multivariate_gradient(
+                    operators,
+                    operators.multivariate_operators[node.index],
+                    ∇f,
+                    f_input,
+                )
+                for (r, i) in enumerate(children_indices)
+                    f.partials_storage[children_arr[i]] = ∇f[r]
+                end
             end
         elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE
             child_idx = children_arr[f.adj.colptr[k]]

From 8420fb722f0085f60e29f8b07037d9f56d8ef884 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Tue, 3 May 2022 09:12:17 +1200
Subject: [PATCH 12/16] Address mlubins comments

---
 docs/src/submodules/Nonlinear/overview.md | 101 ++++++++++++----------
 test/Nonlinear/ReverseAD.jl               |   2 +
 2 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 24d699a6ae..a5c42f9240 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -303,7 +303,7 @@ However, we cannot call gradient terms such as
 not have the capability to differentiate a nonlinear expression.
 
 If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access
-to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix
+to `:Grad`, the gradient of the objective function, `:Jac`, the Jacobian matrix
 of the constraints, `:JacVec`, the ability to compute Jacobian-vector products,
 and `:ExprGraph`.
 ```jldoctest nonlinear_developer
@@ -319,7 +319,7 @@ Nonlinear.Evaluator with available features:
   * :ExprGraph
 ```
 
-However, before calling anything, we need to call [`initialize`](@ref):
+However, before using the evaluator, we need to call [`initialize`](@ref):
 ```jldoctest nonlinear_developer
 julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
 ```
@@ -335,9 +335,9 @@ julia> MOI.eval_objective(evaluator, x)
 ```
 and [`eval_objective_gradient`](@ref):
 ```jldoctest nonlinear_developer
-julia> grad = [NaN]
+julia> grad = [0.0]
 1-element Vector{Float64}:
- NaN
+ 0.0
 
 julia> MOI.eval_objective_gradient(evaluator, grad, x)
 
@@ -551,52 +551,59 @@ subexpressions in the model.
 optimization problem using sparse reverse-mode automatic differentiation (AD).
 
 This section does not attempt to explain how sparse reverse-mode AD works, but
-instead explains why MOI contains it's own implementation, and highlights
+instead explains why MOI contains its own implementation, and highlights
 notable differences from similar packages.
 
 !!! warning
-    You should not interact with `ReverseAD` directly. Instead, you should
-    create a [`Nonlinear.Evaluator`](@ref) object with
-    [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods.
+    Don't use the API in `ReverseAD` to compute derivatives. Instead, create a
+    [`Nonlinear.Evaluator`](@ref) object with [`Nonlinear.SparseReverseMode`](@ref)
+    as the backend, and then query the MOI API methods.
 
-### Why another AD package?
+### Design goals
 
 The JuliaDiff organization maintains a [list of packages](https://juliadiff.org)
-for doing AD in Julia. At last count, there were at least ten packages–not
-including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why
-does MOI maintain another implementation instead of re-using existing tooling?
-
-Here are four reasons:
-
- * **Scale and Sparsity:** the types of functions that MOI computes derivatives
-   of have two key characteristics: they can be very large scale (10^5 or more
-   functions across 10^5 or more variables) and they are very sparse. For large
-   problems, it is common for the hessian to have `O(n)` non-zero elements
-   instead of `O(n^2)` if it was dense. To the best of our knowledge,
-   `ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity
-   by default. The lack of sparsity support is _the_ main reason why we do no
-   use a generic package.
- * **Limited scope:** most other AD packages accept arbitrary Julia functions as
-   input and then trace an expression graph using operator overloading. This
-   means they must deal (or detect and ignore) with control flow, I/O, and other
-   vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the
-   form of [`Nonlinear.Expression`](@ref), which greatly limits the range of
-   syntax that it must deal with. By reducing the scope of what we accept as
-   input to functions relevant for mathematical optimization, we can provide a
-   simpler implementation with various performance optimizations.
- * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
-   development of which begain in early 2014(!). This was well before the other
-   packages started development. Because we had a well-tested, working AD in
-   JuMP, there was less motivation to contribute to and explore other AD
-   packages. The lack of historical interaction also meant that other packages
-   were not optimized for the types of problems that JuMP is built for (i.e.,
-   large-scale sparse problems). When we first created MathOptInterface, we kept
-   the AD in JuMP to simplify the transition, and post-poned the development of
-   a first-class nonlinear interface in MathOptInterface.
- * **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear
-   implementation was a confusing mix of functions and types spread across the
-   code base and in the private `_Derivatives` submodule. This made it hard to
-   swap the AD system for another. The main motivation for refactoring JuMP to
-   create the `Nonlinear` submodule in MathOptInterface was to abstract the
-   interface between JuMP and the AD system, allowing us to swap-in and test new
-   AD systems in the future.
+for doing AD in Julia. At last count, there were at least ten packages–-not
+including `ReverseAD`-–for reverse-mode AD in Julia. `ReverseAD` exists because
+it has a different set of design goals.
+
+ * **Goal: handle scale and sparsity**
+   The types of functions that MOI computes derivatives of have two key
+   characteristics: they can be very large scale (10^5 or more functions across
+   10^5 or more variables) and they are very sparse. For large problems, it is
+   common for the hessian to have `O(n)` non-zero elements instead of `O(n^2)`
+   if it was dense. To the best of our knowledge, `ReverseAD` is the only
+   reverse-mode AD system in Julia that handles sparsity by default.
+ * **Goal: limit the scope to improve robustness**
+   Most other AD packages accept arbitrary Julia functions as input and then
+   trace an expression graph using operator overloading. This means they must
+   deal (or detect and ignore) with control flow, I/O, and other vagaries of
+   Julia. In contrast, `ReverseAD` only accepts functions in the form of
+   [`Nonlinear.Expression`](@ref), which greatly limits the range of syntax that
+   it must deal with. By reducing the scope of what we accept as input to
+   functions relevant for mathematical optimization, we can provide a simpler
+   implementation with various performance optimizations.
+ * **Goal: provide outputs which match what solvers expect**
+   Other AD packages focus on differentiating individual Julia functions. In
+   constrast, `ReverseAD` has a very specific use-case: to generate outputs
+   needed by the MOI nonlinear API. This means it needs to efficiently compute
+   sparse Hessians, and it needs subexpression handling to avoid recomputing
+   subexpressions that are shared between functions.
+
+### History
+
+`ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl),
+development of which begain in early 2014(!). This was well before the other
+packages started development. Because we had a well-tested, working AD in JuMP,
+there was less motivation to contribute to and explore other AD packages. The
+lack of historical interaction also meant that other packages were not optimized
+for the types of problems that JuMP is built for (i.e., large-scale sparse
+problems). When we first created MathOptInterface, we kept the AD in JuMP to
+simplify the transition, and post-poned the development of a first-class
+nonlinear interface in MathOptInterface.
+
+Prior to the introduction of `Nonlinear`, JuMP's nonlinear implementation was a
+confusing mix of functions and types spread across the code base and in the
+private `_Derivatives` submodule. This made it hard to swap the AD system for
+another. The main motivation for refactoring JuMP to create the `Nonlinear`
+submodule in MathOptInterface was to abstract the interface between JuMP and the
+AD system, allowing us to swap-in and test new AD systems in the future.
diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl
index 52ddea52be..67786e2d57 100644
--- a/test/Nonlinear/ReverseAD.jl
+++ b/test/Nonlinear/ReverseAD.jl
@@ -288,6 +288,7 @@ function test_hessian_sparsity_registered_function()
     evaluator =
         Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y, z])
     @test_broken :Hess in MOI.features_available(evaluator)
+    # TODO(odow): re-enable these tests when user-defined hessians are supported
     # MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     # @test MOI.hessian_lagrangian_structure(evaluator) ==
     #       [(1, 1), (2, 2), (3, 3), (3, 1)]
@@ -318,6 +319,7 @@ function test_hessian_sparsity_registered_rosenbrock()
     evaluator =
         Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y])
     @test_broken :Hess in MOI.features_available(evaluator)
+    # TODO(odow): re-enable these tests when user-defined hessians are supported
     # MOI.initialize(evaluator, [:Grad, :Jac, :Hess])
     # @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2), (2, 1)]
     # H = fill(NaN, 3)

From b0b85f460189d39419689192c196e57412d341dd Mon Sep 17 00:00:00 2001
From: Oscar Dowson <odow@users.noreply.github.com>
Date: Tue, 3 May 2022 19:40:35 +1200
Subject: [PATCH 13/16] Update src/Nonlinear/ReverseAD/Coloring/Coloring.jl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
---
 src/Nonlinear/ReverseAD/Coloring/Coloring.jl | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
index 9a3e295c93..e0d89ee35e 100644
--- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
+++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl
@@ -28,10 +28,8 @@ function Base.push!(v::IndexedSet, i::Integer)
 end
 
 function Base.empty!(v::IndexedSet)
-    nzidx = v.nzidx
-    empty = v.empty
     for i in 1:v.nnz
-        empty[nzidx[i]] = true
+        v.empty[v.nzidx[i]] = true
     end
     v.nnz = 0
     return v

From bbd366527c71c82ce36f58d966ca3f534704f17a Mon Sep 17 00:00:00 2001
From: Oscar Dowson <odow@users.noreply.github.com>
Date: Wed, 4 May 2022 09:22:30 +1200
Subject: [PATCH 14/16] Update docs/src/submodules/Nonlinear/overview.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Mathieu Besançon <mathieu.besancon@gmail.com>
---
 docs/src/submodules/Nonlinear/overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index a5c42f9240..98b743e669 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -368,7 +368,7 @@ MathOptInterface as follows:
 import MathOptInterface
 const MOI = MathOptInterface
 
-function build_model(model; backend)
+function build_model(model; backend::Nonlinear.AbstractAutomaticDifferentiation)
     x = MOI.add_variable(model)
     y = MOI.add_variable(model)
     MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)

From 72f12185ebe4f55e8a9d60f5acdc61cd4cae2808 Mon Sep 17 00:00:00 2001
From: Oscar Dowson <odow@users.noreply.github.com>
Date: Wed, 4 May 2022 10:07:46 +1200
Subject: [PATCH 15/16] Update docs/src/submodules/Nonlinear/overview.md

---
 docs/src/submodules/Nonlinear/overview.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 98b743e669..1ab9cba736 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -368,7 +368,10 @@ MathOptInterface as follows:
 import MathOptInterface
 const MOI = MathOptInterface
 
-function build_model(model; backend::Nonlinear.AbstractAutomaticDifferentiation)
+function build_model(
+    model::MOI.ModelLike;
+    backend::MOI.Nonlinear.AbstractAutomaticDifferentiation,
+)
     x = MOI.add_variable(model)
     y = MOI.add_variable(model)
     MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)

From 3d8e547b141f103d542ce703728bb19f46f86026 Mon Sep 17 00:00:00 2001
From: odow <o.dowson@gmail.com>
Date: Fri, 6 May 2022 08:56:23 +1200
Subject: [PATCH 16/16] Tweak scale and sparsity goal

---
 docs/src/submodules/Nonlinear/overview.md | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md
index 1ab9cba736..a69e28d142 100644
--- a/docs/src/submodules/Nonlinear/overview.md
+++ b/docs/src/submodules/Nonlinear/overview.md
@@ -570,12 +570,11 @@ including `ReverseAD`-–for reverse-mode AD in Julia. `ReverseAD` exists becaus
 it has a different set of design goals.
 
  * **Goal: handle scale and sparsity**
-   The types of functions that MOI computes derivatives of have two key
-   characteristics: they can be very large scale (10^5 or more functions across
-   10^5 or more variables) and they are very sparse. For large problems, it is
-   common for the hessian to have `O(n)` non-zero elements instead of `O(n^2)`
-   if it was dense. To the best of our knowledge, `ReverseAD` is the only
-   reverse-mode AD system in Julia that handles sparsity by default.
+   The types of nonlinear optimization problems that MOI represents can be large
+   scale (10^5 or more functions across 10^5 or more variables) with very sparse
+   derivatives. The ability to compute a sparse Hessian matrix is essential. To
+   the best of our knowledge, `ReverseAD` is the only reverse-mode AD system in
+   Julia that handles sparsity by default.
  * **Goal: limit the scope to improve robustness**
    Most other AD packages accept arbitrary Julia functions as input and then
    trace an expression graph using operator overloading. This means they must