From 907080abf7454d6908f8c98d0f70abd1d4a33a86 Mon Sep 17 00:00:00 2001 From: odow Date: Tue, 19 Apr 2022 06:20:05 +1200 Subject: [PATCH 01/16] [Nonlinear] add the Nonlinear.ReverseAD submodule The majority of this development was carried out in the JuMP PRs: * https://github.com/jump-dev/JuMP.jl/pull/2939 * https://github.com/jump-dev/JuMP.jl/pull/2942 * https://github.com/jump-dev/JuMP.jl/pull/2943 Nonlinear.ReverseAD is a minor refactoring of code that previously existed in JuMP under the _Derivatives submodule, and prior to that the ReverseDiffSparse.jl package. --- docs/src/submodules/Nonlinear/overview.md | 105 +++ docs/src/submodules/Nonlinear/reference.md | 1 + src/Nonlinear/Nonlinear.jl | 2 + src/Nonlinear/ReverseAD/Coloring/Coloring.jl | 548 ++++++++++++ .../ReverseAD/Coloring/topological_sort.jl | 106 +++ src/Nonlinear/ReverseAD/ReverseAD.jl | 40 + .../ReverseAD/forward_over_reverse.jl | 396 +++++++++ src/Nonlinear/ReverseAD/graph_tools.jl | 486 +++++++++++ .../ReverseAD/mathoptinterface_api.jl | 355 ++++++++ src/Nonlinear/ReverseAD/precompile.jl | 46 + src/Nonlinear/ReverseAD/reverse_mode.jl | 259 ++++++ src/Nonlinear/ReverseAD/types.jl | 188 ++++ src/Nonlinear/ReverseAD/utils.jl | 63 ++ src/Nonlinear/types.jl | 18 + test/Nonlinear/Nonlinear.jl | 2 + test/Nonlinear/ReverseAD.jl | 803 ++++++++++++++++++ 16 files changed, 3418 insertions(+) create mode 100644 src/Nonlinear/ReverseAD/Coloring/Coloring.jl create mode 100644 src/Nonlinear/ReverseAD/Coloring/topological_sort.jl create mode 100644 src/Nonlinear/ReverseAD/ReverseAD.jl create mode 100644 src/Nonlinear/ReverseAD/forward_over_reverse.jl create mode 100644 src/Nonlinear/ReverseAD/graph_tools.jl create mode 100644 src/Nonlinear/ReverseAD/mathoptinterface_api.jl create mode 100644 src/Nonlinear/ReverseAD/precompile.jl create mode 100644 src/Nonlinear/ReverseAD/reverse_mode.jl create mode 100644 src/Nonlinear/ReverseAD/types.jl create mode 100644 src/Nonlinear/ReverseAD/utils.jl create mode 100644 test/Nonlinear/ReverseAD.jl diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index d0bd4463cb..51e2fa2428 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -285,6 +285,7 @@ There following backends are available to choose from within MOI, although other packages may add more options by sub-typing [`Nonlinear.AbstractAutomaticDifferentiation`](@ref): * [`Nonlinear.ExprGraphOnly`](@ref) + * [`Nonlinear.SparseReverseMode`](@ref). ```jldoctest nonlinear_developer julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x]) @@ -317,6 +318,52 @@ julia> MOI.set(model, MOI.NLPBlock(), block); Only call [`NLPBlockData`](@ref) once you have finished modifying the problem in `model`. +If, instead, we set [`Nonlinear.SparseReverseMode`](@ref), then we get access to +`:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix of +the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and +`:ExprGraph`. +```jldoctest nonlinear_developer +julia> Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + +julia> data +NonlinearData with available features: + * :Grad + * :Jac + * :JacVec + * :ExprGraph +``` + +However, before calling anything, we need to call [`initialize`](@ref): +```jldoctest nonlinear_developer +julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph]) +``` + +Now we can call methods like [`eval_objective`](@ref): +```jldoctest nonlinear_developer +julia> x = [1.0] +1-element Vector{Float64}: + 1.0 + +julia> MOI.eval_objective(data, x) +7.268073418273571 +``` +and [`eval_objective_gradient`](@ref): +```jldoctest nonlinear_developer +julia> grad = [NaN] +1-element Vector{Float64}: + NaN + +julia> MOI.eval_objective_gradient(data, grad, x) + +julia> grad +1-element Vector{Float64}: + 1.909297426825682 +``` + ## Expression-graph representation [`Nonlinear.Model`](@ref) stores nonlinear expressions in @@ -477,3 +524,61 @@ user-defined functions using [`Nonlinear.register_operator`](@ref). [`Nonlinear.Model`](@ref) is a struct that stores the [`Nonlinear.OperatorRegistry`](@ref), as well as a list of parameters and subexpressions in the model. + +## ReverseAD + +`Nonlinear.ReverseAD` is a submodule for computing derivatives of the problem +inside [`Nonlinear.NonlinearData`](@ref) using sparse reverse-mode automatic +differentiation (AD). + +This section does not attempt to explain how sparse reverse-mode AD works, but +instead explains why MOI contains it's own implementation, and highlights +notable differences from similar packages. + +!!! warning + You should not interact with `ReverseAD` directly. Instead, you should + create a [`Nonlinear.NonlinearData`](@ref) object, call + [`Nonlinear.set_differentiation_backend`](@ref) with + [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods. + +### Why another AD package? + +The JuliaDiff organization maintains a [list of packages](https://juliadiff.org) +for doing AD in Julia. At last count, there were at least ten packages–not +including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why +does MOI maintain another implementation instead of re-using existing tooling? + +Here are four reasons: + + * **Scale and Sparsity:** the types of functions that MOI computes derivatives + of have two key characteristics: they can be very large scale (10^5 or more + functions across 10^5 or more variables) and they are very sparse. For large + problems, it is common for the hessian to have `O(n)` non-zero elements + instead of `O(n^2)` if it was dense. To the best of our knowledge, + `ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity + by default. The lack of sparsity support is _the_ main reason why we do no + use a generic package. + * **Limited scope:** most other AD packages accept arbitrary Julia functions as + input and then trace an expression graph using operator overloading. This + means they must deal (or detect and ignore) with control flow, I/O, and other + vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the + form of [`Nonlinear.NonlinearExpression`](@ref), which greatly limits the + range of syntax that it must deal with. By reducing the scope of what we + accept as input to functions relevant for mathematical optimization, we can + provide a simpler implementation with various performance optimizations. + * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl), + development of which begain in early 2014(!). This was well before the other + packages started development. Because we had a well-tested, working AD in + JuMP, there was less motivation to contribute to and explore other AD + packages. The lack of historical interaction also meant that other packages + were not optimized for the types of problems that JuMP is built for (i.e., + large-scale sparse problems). When we first created MathOptInterface, we kept + the AD in JuMP to simplify the transition, and post-poned the development of + a first-class nonlinear interface in MathOptInterface. + * **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear + implementation was a confusing mix of functions and types spread across the + code base and in the private `_Derivatives` submodule. This made it hard to + swap the AD system for another. The main motivation for refactoring JuMP to + create the `Nonlinear` submodule in MathOptInterface was to abstract the + interface between JuMP and the AD system, allowing us to swap-in and test new + AD systems in the future. diff --git a/docs/src/submodules/Nonlinear/reference.md b/docs/src/submodules/Nonlinear/reference.md index f1d0124624..f73c3f773a 100644 --- a/docs/src/submodules/Nonlinear/reference.md +++ b/docs/src/submodules/Nonlinear/reference.md @@ -70,6 +70,7 @@ Nonlinear.eval_comparison_function Nonlinear.Evaluator Nonlinear.AbstractAutomaticDifferentiation Nonlinear.ExprGraphOnly +Nonlinear.SparseReverseMode ``` ## Data-structure diff --git a/src/Nonlinear/Nonlinear.jl b/src/Nonlinear/Nonlinear.jl index b45f4523f5..cb54617d8b 100644 --- a/src/Nonlinear/Nonlinear.jl +++ b/src/Nonlinear/Nonlinear.jl @@ -50,4 +50,6 @@ include("parse.jl") include("model.jl") include("evaluator.jl") +include("ReverseAD/ReverseAD.jl") + end # module diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl new file mode 100644 index 0000000000..6a2b5f3eb5 --- /dev/null +++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl @@ -0,0 +1,548 @@ +module Coloring + +import DataStructures + +include("topological_sort.jl") + +# indexed sparse set of integers +mutable struct IndexedSet + nzidx::Vector{Int} + empty::BitVector + nnz::Int +end + +IndexedSet(n::Integer) = IndexedSet(zeros(Int, n), trues(n), 0) + +function Base.push!(v::IndexedSet, i::Integer) + if v.empty[i] # new index + v.nzidx[v.nnz+=1] = i + v.empty[i] = false + end + return +end + +function Base.empty!(v::IndexedSet) + nzidx = v.nzidx + empty = v.empty + for i in 1:v.nnz + empty[nzidx[i]] = true + end + v.nnz = 0 + return v +end + +Base.length(v::IndexedSet) = length(v.nzidx) + +function Base.resize!(v::IndexedSet, n::Integer) + if n > length(v) + @assert v.nnz == 0 # only resize empty vector + resize!(v.nzidx, n) + resize!(v.empty, n) + fill!(v.empty, true) + end + return +end + +Base.collect(v::IndexedSet) = v.nzidx[1:v.nnz] + +function Base.union!(v::IndexedSet, s) + for x in s + push!(v, x) + end + return +end + +# compact storage for an undirected graph +# neighbors of vertex i start at adjlist[offsets[i]] +struct UndirectedGraph + adjlist::Vector{Int} + edgeindex::Vector{Int} # corresponding edge number, indexed as adjlist + offsets::Vector{Int} + edges::Vector{Tuple{Int,Int}} +end + +_num_vertices(g::UndirectedGraph) = length(g.offsets) - 1 + +_num_edges(g::UndirectedGraph) = length(g.edges) + +_num_neighbors(i::Int, g::UndirectedGraph) = g.offsets[i+1] - g.offsets[i] + +_start_neighbors(i::Int, g::UndirectedGraph) = g.offsets[i] + +function UndirectedGraph(I, J, nel) + adjcount = zeros(Int, nel) + n_edges = 0 + for k in 1:length(I) + i = I[k] + j = J[k] + if i == j + continue + end + n_edges += 1 + adjcount[i] += 1 + adjcount[j] += 1 + end + offsets = Vector{Int}(undef, nel + 1) + offsets[1] = 1 + for k in 1:nel + offsets[k+1] = offsets[k] + adjcount[k] + end + fill!(adjcount, 0) + edges = Array{Tuple{Int,Int}}(undef, n_edges) + adjlist = Vector{Int}(undef, offsets[nel+1] - 1) + edgeindex = Vector{Int}(undef, length(adjlist)) + edge_count = 0 + for k in 1:length(I) + i = I[k] + j = J[k] + if i == j + continue + end + edge_count += 1 + adjlist[offsets[i]+adjcount[i]] = j + edgeindex[offsets[i]+adjcount[i]] = edge_count + adjcount[i] += 1 + adjlist[offsets[j]+adjcount[j]] = i + edgeindex[offsets[j]+adjcount[j]] = edge_count + adjcount[j] += 1 + edges[edge_count] = (i, j) + end + @assert edge_count == n_edges + return UndirectedGraph(adjlist, edgeindex, offsets, edges) +end + +struct _Edge + index::Int + source::Int + target::Int +end + +function _prevent_cycle( + v, + w, + x, + e_idx1, + e_idx2, + S, + firstVisitToTree, + forbiddenColors, + color, +) + er = DataStructures.find_root!(S, e_idx2) + @inbounds first = firstVisitToTree[er] + p = first.source # but this depends on the order? + q = first.target + @inbounds if p != v + firstVisitToTree[er] = _Edge(e_idx1, v, w) + elseif q != w + forbiddenColors[color[x]] = v + end + return +end + +function _grow_star(v, w, e_idx, firstNeighbor, color, S) + @inbounds e = firstNeighbor[color[w]] + p = e.source + @inbounds if p != v + firstNeighbor[color[w]] = _Edge(e_idx, v, w) + else + union!(S, e_idx, e.index) + end + return +end + +function _merge_trees(eg, eg1, S) + e1 = DataStructures.find_root!(S, eg) + e2 = DataStructures.find_root!(S, eg1) + if e1 != e2 + union!(S, eg, eg1) + end + return +end + +""" + acyclic_coloring(g::UndirectedGraph) + +Implement the acyclic coloring algorithm of Gebremdehin, Tarafdar, Manne, and +Pothen, "New Acyclic and Star Coloring Algorithms with Application to Computing +Hessians." SIAM J. Sci. Comput. 2007. + +Returns `Tuple{Vector{Int},Int}` giving the color index of each node in `g`, as +well as the total number of colors used. +""" +function acyclic_coloring(g::UndirectedGraph) + if _num_edges(g) == 0 + return fill(1, _num_vertices(g)), 1 + end + num_colors = 0 + forbiddenColors = Int[] + firstNeighbor = _Edge[] + firstVisitToTree = fill(_Edge(0, 0, 0), _num_edges(g)) + color = fill(0, _num_vertices(g)) + # disjoint set forest of edges in the graph + S = DataStructures.IntDisjointSets(_num_edges(g)) + @inbounds for v in 1:_num_vertices(g) + n_neighbor = _num_neighbors(v, g) + start_neighbor = _start_neighbors(v, g) + for k in 0:(n_neighbor-1) + w = g.adjlist[start_neighbor+k] + if color[w] == 0 + continue + end + forbiddenColors[color[w]] = v + end + for k in 0:(n_neighbor-1) + w = g.adjlist[start_neighbor+k] + e_idx = g.edgeindex[start_neighbor+k] + if color[w] == 0 + continue + end + n_neighbor_w = _num_neighbors(w, g) + start_neighbor_w = _start_neighbors(w, g) + for k2 in 0:(n_neighbor_w-1) + x = g.adjlist[start_neighbor_w+k2] + e2_idx = g.edgeindex[start_neighbor_w+k2] + if color[x] == 0 + continue + end + if forbiddenColors[color[x]] != v + _prevent_cycle( + v, + w, + x, + e_idx, + e2_idx, + S, + firstVisitToTree, + forbiddenColors, + color, + ) + end + end + end + # find feasible color + found = false + for k in 1:num_colors + if forbiddenColors[k] != v + color[v] = k + found = true + break + end + end + if !found + num_colors += 1 + push!(forbiddenColors, 0) + push!(firstNeighbor, _Edge(0, 0, 0)) + color[v] = num_colors + end + for k in 0:(n_neighbor-1) + w = g.adjlist[start_neighbor+k] + e_idx = g.edgeindex[start_neighbor+k] + if color[w] == 0 + continue + end + _grow_star(v, w, e_idx, firstNeighbor, color, S) + end + for k in 0:(n_neighbor-1) + w = g.adjlist[start_neighbor+k] + e_idx = g.edgeindex[start_neighbor+k] + if color[w] == 0 + continue + end + n_neighbor_w = _num_neighbors(w, g) + start_neighbor_w = _start_neighbors(w, g) + for k2 in 0:(n_neighbor_w-1) + x = g.adjlist[start_neighbor_w+k2] + e2_idx = g.edgeindex[start_neighbor_w+k2] + if color[x] == 0 || x == v + continue + end + if color[x] == color[v] + _merge_trees(e_idx, e2_idx, S) + end + end + end + end + return color, num_colors +end + +struct RecoveryInfo + vertexmap::Vector{Vector{Int}} + postorder::Vector{Vector{Int}} + parents::Vector{Vector{Int}} + color::Vector{Int} + num_colors::Int + nnz::Int # number of off-diagonal nonzeros + local_indices::Vector{Int} # map back to global indices +end + +function RecoveryInfo() + return RecoveryInfo( + Vector{Vector{Int}}(undef, 0), + Vector{Vector{Int}}(undef, 0), + Vector{Vector{Int}}(undef, 0), + Vector{Int}(undef, 0), + 0, + 0, + Vector{Int}(undef, 0), + ) +end + +function recovery_preprocess( + g::UndirectedGraph, + color, + num_colors, + local_indices, +) + # represent two-color subgraph as: + # list of vertices (with map to global indices) + # adjacency list in a single vector (with list of offsets) + # linear index of pair of colors + twocolorindex = zeros(Int32, num_colors, num_colors) + seen_twocolors = 0 + # count of edges in each subgraph + edge_count = Int[] + for k in 1:length(g.edges) + u, v = g.edges[k] + i = min(color[u], color[v]) + j = max(color[u], color[v]) + if twocolorindex[i, j] == 0 + seen_twocolors += 1 + twocolorindex[i, j] = seen_twocolors + push!(edge_count, 0) + end + idx = twocolorindex[i, j] + edge_count[idx] += 1 + end + # edges sorted by twocolor subgraph + sorted_edges = Array{Vector{Tuple{Int,Int}}}(undef, seen_twocolors) + for idx in 1:seen_twocolors + sorted_edges[idx] = Tuple{Int,Int}[] + sizehint!(sorted_edges[idx], edge_count[idx]) + end + for i in 1:length(g.edges) + u, v = g.edges[i] + i = min(color[u], color[v]) + j = max(color[u], color[v]) + idx = twocolorindex[i, j] + push!(sorted_edges[idx], (u, v)) + end + # list of unique vertices in each twocolor subgraph + vertexmap = Array{Vector{Int}}(undef, seen_twocolors) + postorder = Array{Vector{Int}}(undef, seen_twocolors) + parents = Array{Vector{Int}}(undef, seen_twocolors) + # temporary lookup map from global index to subgraph index + revmap = zeros(Int, _num_vertices(g)) + adjcount = zeros(Int, _num_vertices(g)) + cmap = zeros(Int, 0) # shared storage for DFS + for idx in 1:seen_twocolors + my_edges = sorted_edges[idx] + vlist = Int[] + # build up the vertex list and adjacency count + for k in 1:length(my_edges) + u, v = my_edges[k] + # seen these vertices yet? + if revmap[u] == 0 + push!(vlist, u) + revmap[u] = length(vlist) + adjcount[u] = 0 + end + if revmap[v] == 0 + push!(vlist, v) + revmap[v] = length(vlist) + adjcount[v] = 0 + end + adjcount[u] += 1 + adjcount[v] += 1 + end + # set up offsets for adjlist + offset = Vector{Int}(undef, length(vlist) + 1) + offset[1] = 1 + for k in 1:length(vlist) + offset[k+1] = offset[k] + adjcount[vlist[k]] + adjcount[vlist[k]] = 0 + end + # adjlist for node u in twocolor idx starts at + # vec[offset[u]] + # u has global index vlist[u] + vec = Vector{Int}(undef, offset[length(vlist)+1] - 1) + # now fill in + for k in 1:length(my_edges) + u, v = my_edges[k] + u_rev = revmap[u] # indices in the subgraph + v_rev = revmap[v] + vec[offset[u_rev]+adjcount[u]] = v_rev + vec[offset[v_rev]+adjcount[v]] = u_rev + adjcount[u] += 1 + adjcount[v] += 1 + end + resize!(cmap, length(vlist)) + order, parent = + reverse_topological_sort_by_dfs(vec, offset, length(vlist), cmap) + for k in 1:length(vlist) + revmap[vlist[k]] = 0 # clear for reuse + end + postorder[idx] = order + parents[idx] = parent + vertexmap[idx] = vlist + end + return RecoveryInfo( + vertexmap, + postorder, + parents, + color, + num_colors, + _num_edges(g), + local_indices, + ) +end + +_normalize(i, j) = (j > i) ? (j, i) : (i, j) + +function _indirect_recover_structure(rinfo::RecoveryInfo) + N = length(rinfo.color) + I = zeros(Int, rinfo.nnz + N) + J = zeros(Int, rinfo.nnz + N) + k = 0 + for i in 1:N + k += 1 + I[k] = J[k] = i + end + for t in 1:length(rinfo.postorder) + vmap = rinfo.vertexmap[t] + order = rinfo.postorder[t] + parent = rinfo.parents[t] + for z in 1:length(order) + v = order[z] + p = parent[v] + if p == 0 + continue + end + k += 1 + I[k], J[k] = _normalize(vmap[v], vmap[p]) + end + end + @assert k == rinfo.nnz + N + return I, J +end + +""" + hessian_color_preprocess( + edgelist, + num_total_var, + seen_idx = IndexedSet(0), + ) + +`edgelist` contains the nonzeros in the Hessian, *including* nonzeros on the +diagonal. +""" +function hessian_color_preprocess( + edgelist, + num_total_var, + seen_idx = IndexedSet(0), +) + resize!(seen_idx, num_total_var) + I, J = Int[], Int[] + for (i, j) in edgelist + push!(seen_idx, i) + push!(seen_idx, j) + push!(I, i) + push!(J, j) + end + local_indices = sort!(seen_idx.nzidx[1:seen_idx.nnz]) + empty!(seen_idx) + global_to_local_idx = seen_idx.nzidx # steal for storage + for k in 1:length(local_indices) + global_to_local_idx[local_indices[k]] = k + end + # only do the coloring on the local indices + for k in 1:length(I) + I[k] = global_to_local_idx[I[k]] + J[k] = global_to_local_idx[J[k]] + end + g = UndirectedGraph(I, J, length(local_indices)) + color, num_colors = acyclic_coloring(g) + @assert length(color) == _num_vertices(g) + rinfo = recovery_preprocess(g, color, num_colors, local_indices) + I, J = _indirect_recover_structure(rinfo) + # convert back to global indices + for k in 1:length(I) + I[k] = local_indices[I[k]] + J[k] = local_indices[J[k]] + end + return I, J, rinfo +end + +""" + seed_matrix(rinfo::RecoveryInfo) + +Allocate a seed matrix for the Coloring. +""" +function seed_matrix(rinfo::RecoveryInfo) + return Array{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors) +end + +""" + prepare_seed_matrix!(R, rinfo::RecoveryInfo) +""" +function prepare_seed_matrix!(R, rinfo::RecoveryInfo) + N = length(rinfo.color) + @assert N == size(R, 1) == length(rinfo.local_indices) + @assert size(R, 2) == rinfo.num_colors + fill!(R, 0.0) + for i in 1:N + R[i, rinfo.color[i]] = 1 + end + return +end + +""" + recover_from_matmat!( + V::AbstractVector{T}, + R::AbstractMatrix{T}, + rinfo::RecoveryInfo, + stored_values::AbstractVector{T}, + ) where {T} + +recover the hessian values from the hessian-matrix solution +stored_values is a temporary vector of length >= length(rinfo.local_indices) +""" +function recover_from_matmat!( + V::AbstractVector{T}, + R::AbstractMatrix{T}, + rinfo::RecoveryInfo, + stored_values::AbstractVector{T}, +) where {T} + N = length(rinfo.color) # number of local variables + @assert length(V) == rinfo.nnz + N + @assert length(stored_values) >= length(rinfo.local_indices) + k = 0 + for i in 1:N + k += 1 + V[k] = R[i, rinfo.color[i]] + end + for t in 1:length(rinfo.vertexmap) + vmap = rinfo.vertexmap[t] + order = rinfo.postorder[t] + parent = rinfo.parents[t] + for z in 1:length(order) + @inbounds stored_values[z] = 0.0 + end + @inbounds for z in 1:length(order) + v = order[z] + p = parent[v] + if p == 0 + continue + end + i, j = vmap[v], vmap[p] + value = R[i, rinfo.color[j]] - stored_values[v] + stored_values[p] += value + k += 1 + V[k] = value + end + end + @assert k == rinfo.nnz + N + return +end + +end # module diff --git a/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl b/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl new file mode 100644 index 0000000000..0b8d0ca2eb --- /dev/null +++ b/src/Nonlinear/ReverseAD/Coloring/topological_sort.jl @@ -0,0 +1,106 @@ +# Modified from Graphs.jl. +# Workaround for edge_index not being O(1) on SimpleGraph. +# edge_index was only needed to test for cycles, so +# this implementation skips that check. + +# Graphs.jl is licensed under the MIT License: +# +# Copyright (c) 2012: John Myles White and other contributors. +# +# Permission is hereby granted, free of charge, to any person obtaining +# a copy of this software and associated documentation files (the +# "Software"), to deal in the Software without restriction, including +# without limitation the rights to use, copy, modify, merge, publish, +# distribute, sublicense, and/or sell copies of the Software, and to +# permit persons to whom the Software is furnished to do so, subject to +# the following conditions: +# +# The above copyright notice and this permission notice shall be +# included in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +mutable struct _TopologicalSortVisitor + vertices::Vector{Int} + parents::Vector{Int} + + function _TopologicalSortVisitor(n::Int) + vs = Int[] + sizehint!(vs, n) + return new(vs, zeros(Int, n)) + end +end + +function _traverse_graph( + adjlist, + offsets, + s, + visitor::_TopologicalSortVisitor, + vertexcolormap, + vertex_stack, + index_stack, +) + vertexcolormap[s] = 1 + resize!(vertex_stack, 1) + vertex_stack[1] = s + resize!(index_stack, 1) + index_stack[1] = 1 + while !isempty(vertex_stack) + u = pop!(vertex_stack) + out_idx = pop!(index_stack) + len_uegs = offsets[u+1] - offsets[u] + found_new_vertex = false + while out_idx <= len_uegs && !found_new_vertex + v = adjlist[offsets[u]+out_idx-1] + out_idx += 1 + v_color = vertexcolormap[v] + if v_color == 0 + found_new_vertex = true + vertexcolormap[v] = 1 + push!(vertex_stack, u) + push!(index_stack, out_idx) + visitor.parents[v] = u + push!(vertex_stack, v) + push!(index_stack, 1) + end + end + if !found_new_vertex + push!(visitor.vertices, u) + vertexcolormap[u] = 2 + end + end + return +end + +function reverse_topological_sort_by_dfs( + adjlist, + offsets, + num_vertices, + cmap::Vector{Int}, + vertex_stack::Vector{Int} = Int[], + index_stack::Vector{Int} = Int[], +) + @assert length(cmap) == num_vertices + fill!(cmap, 0) + visitor = _TopologicalSortVisitor(num_vertices) + for s in 1:num_vertices + if cmap[s] == 0 + _traverse_graph( + adjlist, + offsets, + s, + visitor, + cmap, + vertex_stack, + index_stack, + ) + end + end + return visitor.vertices, visitor.parents +end diff --git a/src/Nonlinear/ReverseAD/ReverseAD.jl b/src/Nonlinear/ReverseAD/ReverseAD.jl new file mode 100644 index 0000000000..5d865466d0 --- /dev/null +++ b/src/Nonlinear/ReverseAD/ReverseAD.jl @@ -0,0 +1,40 @@ +module ReverseAD + +import ForwardDiff +import MathOptInterface +import ..Nonlinear +import SparseArrays + +const MOI = MathOptInterface + +# Override basic math functions to return NaN instead of throwing errors. +# This is what NLP solvers expect, and sometimes the results aren't needed +# anyway, because the code may compute derivatives wrt constants. +import NaNMath: + sin, + cos, + tan, + asin, + acos, + acosh, + atanh, + log, + log2, + log10, + lgamma, + log1p, + pow, + sqrt + +include("Coloring/Coloring.jl") +include("graph_tools.jl") +include("types.jl") +include("utils.jl") + +include("reverse_mode.jl") +include("forward_over_reverse.jl") +include("mathoptinterface_api.jl") + +include("precompile.jl") + +end # module diff --git a/src/Nonlinear/ReverseAD/forward_over_reverse.jl b/src/Nonlinear/ReverseAD/forward_over_reverse.jl new file mode 100644 index 0000000000..a272cba32d --- /dev/null +++ b/src/Nonlinear/ReverseAD/forward_over_reverse.jl @@ -0,0 +1,396 @@ +const TAG = :ReverseAD + +""" + _eval_hessian( + d::NLPEvaluator, + f::_FunctionStorage, + H::AbstractVector{Float64}, + λ::Float64, + offset::Int, + )::Int + +Evaluate the hessian matrix of the function `f` and store the result, scaled by +`λ`, in `H`, beginning at element `offset+1`. This function assumes that +`_reverse_mode(d, x)` has already been called. + +Returns the number of non-zeros in the computed Hessian, which will be used to +update the offset for the next call. +""" +function _eval_hessian( + d::NLPEvaluator, + f::_FunctionStorage, + H::AbstractVector{Float64}, + λ::Float64, + offset::Int, +)::Int + chunk = min(size(f.seed_matrix, 2), d.max_chunk) + # As a performance optimization, skip dynamic dispatch if the chunk is 1. + if chunk == 1 + return _eval_hessian_inner(d, f, H, λ, offset, Val(1)) + else + return _eval_hessian_inner(d, f, H, λ, offset, Val(chunk)) + end +end + +function _eval_hessian_inner( + d::NLPEvaluator, + ex::_FunctionStorage, + H::AbstractVector{Float64}, + scale::Float64, + nzcount::Int, + ::Val{CHUNK}, +) where {CHUNK} + if ex.linearity == LINEAR + @assert length(ex.hess_I) == 0 + return 0 + end + T = ForwardDiff.Partials{CHUNK,Float64} # This is our element type. + Coloring.prepare_seed_matrix!(ex.seed_matrix, ex.rinfo) + local_to_global_idx = ex.rinfo.local_indices + input_ϵ_raw, output_ϵ_raw = d.input_ϵ, d.output_ϵ + input_ϵ = _reinterpret_unsafe(T, input_ϵ_raw) + output_ϵ = _reinterpret_unsafe(T, output_ϵ_raw) + # Compute hessian-vector products + num_products = size(ex.seed_matrix, 2) # number of hessian-vector products + num_chunks = div(num_products, CHUNK) + @assert size(ex.seed_matrix, 1) == length(local_to_global_idx) + for k in 1:CHUNK:CHUNK*num_chunks + for r in 1:length(local_to_global_idx) + # set up directional derivatives + @inbounds idx = local_to_global_idx[r] + # load up ex.seed_matrix[r,k,k+1,...,k+CHUNK-1] into input_ϵ + for s in 1:CHUNK + input_ϵ_raw[(idx-1)*CHUNK+s] = ex.seed_matrix[r, k+s-1] + end + @inbounds output_ϵ[idx] = zero(T) + end + _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, T) + # collect directional derivatives + for r in 1:length(local_to_global_idx) + idx = local_to_global_idx[r] + # load output_ϵ into ex.seed_matrix[r,k,k+1,...,k+CHUNK-1] + for s in 1:CHUNK + ex.seed_matrix[r, k+s-1] = output_ϵ_raw[(idx-1)*CHUNK+s] + end + @inbounds input_ϵ[idx] = zero(T) + end + end + # leftover chunk + remaining = num_products - CHUNK * num_chunks + if remaining > 0 + k = CHUNK * num_chunks + 1 + for r in 1:length(local_to_global_idx) + # set up directional derivatives + @inbounds idx = local_to_global_idx[r] + # load up ex.seed_matrix[r,k,k+1,...,k+remaining-1] into input_ϵ + for s in 1:remaining + # leave junk in the unused components + input_ϵ_raw[(idx-1)*CHUNK+s] = ex.seed_matrix[r, k+s-1] + end + @inbounds output_ϵ[idx] = zero(T) + end + _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, T) + # collect directional derivatives + for r in 1:length(local_to_global_idx) + idx = local_to_global_idx[r] + # load output_ϵ into ex.seed_matrix[r,k,k+1,...,k+remaining-1] + for s in 1:remaining + ex.seed_matrix[r, k+s-1] = output_ϵ_raw[(idx-1)*CHUNK+s] + end + @inbounds input_ϵ[idx] = zero(T) + end + end + # TODO(odow): consider reverting to a view. + output_slice = _UnsafeVectorView(nzcount, length(ex.hess_I), pointer(H)) + Coloring.recover_from_matmat!( + output_slice, + ex.seed_matrix, + ex.rinfo, + d.output_ϵ, + ) + for i in 1:length(output_slice) + output_slice[i] *= scale + end + return length(ex.hess_I) +end + +function _hessian_slice_inner(d, ex, input_ϵ, output_ϵ, ::Type{T}) where {T} + subexpr_forward_values_ϵ = + _reinterpret_unsafe(T, d.subexpression_forward_values_ϵ) + for i in ex.dependent_subexpressions + subexpr = d.subexpressions[i] + subexpr_forward_values_ϵ[i] = _forward_eval_ϵ( + subexpr, + _reinterpret_unsafe(T, subexpr.forward_storage_ϵ), + _reinterpret_unsafe(T, subexpr.partials_storage_ϵ), + input_ϵ, + subexpr_forward_values_ϵ, + d.data.operators, + ) + end + _forward_eval_ϵ( + ex, + _reinterpret_unsafe(T, d.forward_storage_ϵ), + _reinterpret_unsafe(T, d.partials_storage_ϵ), + input_ϵ, + subexpr_forward_values_ϵ, + d.data.operators, + ) + # do a reverse pass + subexpr_reverse_values_ϵ = + _reinterpret_unsafe(T, d.subexpression_reverse_values_ϵ) + for i in ex.dependent_subexpressions + subexpr_reverse_values_ϵ[i] = zero(T) + d.subexpression_reverse_values[i] = 0.0 + end + _reverse_eval_ϵ( + output_ϵ, + ex, + _reinterpret_unsafe(T, d.reverse_storage_ϵ), + _reinterpret_unsafe(T, d.partials_storage_ϵ), + d.subexpression_reverse_values, + subexpr_reverse_values_ϵ, + 1.0, + zero(T), + ) + for i in length(ex.dependent_subexpressions):-1:1 + j = ex.dependent_subexpressions[i] + subexpr = d.subexpressions[j] + _reverse_eval_ϵ( + output_ϵ, + subexpr, + _reinterpret_unsafe(T, subexpr.reverse_storage_ϵ), + _reinterpret_unsafe(T, subexpr.partials_storage_ϵ), + d.subexpression_reverse_values, + subexpr_reverse_values_ϵ, + d.subexpression_reverse_values[j], + subexpr_reverse_values_ϵ[j], + ) + end + return +end + +""" + _forward_eval_ϵ( + ex::Union{_FunctionStorage,_SubexpressionStorage}, + storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}}, + partials_storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}}, + x_values_ϵ, + subexpression_values_ϵ, + user_operators::Nonlinear.OperatorRegistry, + ) where {N,T} + +Evaluate the directional derivatives of the expression tree in `ex`. + +This is equivalent to evaluating the expression tree using DualNumbers or +ForwardDiff, but instead we keep the `ex.forward_storage` for the epsilon +components separate so that we don't need to recompute the real components. + +This assumes that `_reverse_model(d, x)` has already been called. +""" +function _forward_eval_ϵ( + ex::Union{_FunctionStorage,_SubexpressionStorage}, + storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}}, + partials_storage_ϵ::AbstractVector{ForwardDiff.Partials{N,T}}, + x_values_ϵ, + subexpression_values_ϵ, + user_operators::Nonlinear.OperatorRegistry, +) where {N,T} + @assert length(storage_ϵ) >= length(ex.nodes) + @assert length(partials_storage_ϵ) >= length(ex.nodes) + zero_ϵ = zero(ForwardDiff.Partials{N,T}) + # ex.nodes is already in order such that parents always appear before children + # so a backwards pass through ex.nodes is a forward pass through the tree + children_arr = SparseArrays.rowvals(ex.adj) + for k in length(ex.nodes):-1:1 + node = ex.nodes[k] + partials_storage_ϵ[k] = zero_ϵ + if node.type == Nonlinear.NODE_VARIABLE + storage_ϵ[k] = x_values_ϵ[node.index] + elseif node.type == Nonlinear.NODE_VALUE + storage_ϵ[k] = zero_ϵ + elseif node.type == Nonlinear.NODE_SUBEXPRESSION + storage_ϵ[k] = subexpression_values_ϵ[node.index] + elseif node.type == Nonlinear.NODE_PARAMETER + storage_ϵ[k] = zero_ϵ + else + @assert node.type != Nonlinear.NODE_MOI_VARIABLE + ϵtmp = zero_ϵ + @inbounds children_idx = SparseArrays.nzrange(ex.adj, k) + for c_idx in children_idx + @inbounds ix = children_arr[c_idx] + @inbounds partial = ex.partials_storage[ix] + @inbounds storage_val = storage_ϵ[ix] + # TODO: This "if" statement can take 8% of the hessian + # evaluation time! Find a more efficient way. + if !isfinite(partial) && storage_val == zero_ϵ + continue + end + ϵtmp += storage_val * ex.partials_storage[ix] + end + storage_ϵ[k] = ϵtmp + if node.type == Nonlinear.NODE_CALL_MULTIVARIATE + # TODO(odow): consider how to refactor this into Nonlinear. + op = node.index + n_children = length(children_idx) + if op == 3 # :* + # Lazy approach for now. + anyzero = false + tmp_prod = one(ForwardDiff.Dual{TAG,T,N}) + for c_idx in children_idx + ix = children_arr[c_idx] + sval = ex.forward_storage[ix] + gnum = ForwardDiff.Dual{TAG}(sval, storage_ϵ[ix]) + tmp_prod *= gnum + anyzero = ifelse(sval * sval == zero(T), true, anyzero) + end + # By a quirk of floating-point numbers, we can have + # anyzero == true && ForwardDiff.value(tmp_prod) != zero(T) + if anyzero || n_children <= 2 + for c_idx in children_idx + prod_others = one(ForwardDiff.Dual{TAG,T,N}) + for c_idx2 in children_idx + (c_idx == c_idx2) && continue + ix = children_arr[c_idx2] + gnum = ForwardDiff.Dual{TAG}( + ex.forward_storage[ix], + storage_ϵ[ix], + ) + prod_others *= gnum + end + partials_storage_ϵ[children_arr[c_idx]] = + ForwardDiff.partials(prod_others) + end + else + for c_idx in children_idx + ix = children_arr[c_idx] + prod_others = + tmp_prod / ForwardDiff.Dual{TAG}( + ex.forward_storage[ix], + storage_ϵ[ix], + ) + partials_storage_ϵ[ix] = + ForwardDiff.partials(prod_others) + end + end + elseif op == 4 # :^ + @assert n_children == 2 + idx1 = first(children_idx) + idx2 = last(children_idx) + @inbounds ix1 = children_arr[idx1] + @inbounds ix2 = children_arr[idx2] + @inbounds base = ex.forward_storage[ix1] + @inbounds base_ϵ = storage_ϵ[ix1] + @inbounds exponent = ex.forward_storage[ix2] + @inbounds exponent_ϵ = storage_ϵ[ix2] + base_gnum = ForwardDiff.Dual{TAG}(base, base_ϵ) + exponent_gnum = ForwardDiff.Dual{TAG}(exponent, exponent_ϵ) + if exponent == 2 + partials_storage_ϵ[ix1] = 2 * base_ϵ + elseif exponent == 1 + partials_storage_ϵ[ix1] = zero_ϵ + else + partials_storage_ϵ[ix1] = ForwardDiff.partials( + exponent_gnum * base_gnum^(exponent_gnum - 1), + ) + end + result_gnum = ForwardDiff.Dual{TAG}( + ex.forward_storage[k], + storage_ϵ[k], + ) + # TODO(odow): fix me to use NaNMath.jl instead + log_base_gnum = base_gnum < 0 ? NaN : log(base_gnum) + partials_storage_ϵ[ix2] = + ForwardDiff.partials(result_gnum * log_base_gnum) + elseif op == 5 # :/ + @assert n_children == 2 + idx1 = first(children_idx) + idx2 = last(children_idx) + @inbounds ix1 = children_arr[idx1] + @inbounds ix2 = children_arr[idx2] + @inbounds numerator = ex.forward_storage[ix1] + @inbounds numerator_ϵ = storage_ϵ[ix1] + @inbounds denominator = ex.forward_storage[ix2] + @inbounds denominator_ϵ = storage_ϵ[ix2] + recip_denominator = + 1 / ForwardDiff.Dual{TAG}(denominator, denominator_ϵ) + partials_storage_ϵ[ix1] = + ForwardDiff.partials(recip_denominator) + partials_storage_ϵ[ix2] = ForwardDiff.partials( + -ForwardDiff.Dual{TAG}(numerator, numerator_ϵ) * + recip_denominator * + recip_denominator, + ) + elseif op > 6 + error( + "User-defined operators not supported for hessian " * + "computations", + ) + end + elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE + @inbounds child_idx = children_arr[ex.adj.colptr[k]] + f′′ = Nonlinear.eval_univariate_hessian( + user_operators, + user_operators.univariate_operators[node.index], + ex.forward_storage[child_idx], + ) + partials_storage_ϵ[child_idx] = f′′ * storage_ϵ[child_idx] + end + end + end + return storage_ϵ[1] +end + +# Compute directional derivatives of the reverse pass, goes with _forward_eval_ϵ +# to compute hessian-vector products. +function _reverse_eval_ϵ( + output_ϵ::AbstractVector{ForwardDiff.Partials{N,T}}, + ex::Union{_FunctionStorage,_SubexpressionStorage}, + reverse_storage_ϵ, + partials_storage_ϵ, + subexpression_output, + subexpression_output_ϵ, + scale::T, + scale_ϵ::ForwardDiff.Partials{N,T}, +) where {N,T} + @assert length(reverse_storage_ϵ) >= length(ex.nodes) + @assert length(partials_storage_ϵ) >= length(ex.nodes) + if ex.nodes[1].type == Nonlinear.NODE_VARIABLE + @inbounds output_ϵ[ex.nodes[1].index] += scale_ϵ + return + elseif ex.nodes[1].type == Nonlinear.NODE_SUBEXPRESSION + @inbounds subexpression_output[ex.nodes[1].index] += + scale * ex.reverse_storage[1] + @inbounds subexpression_output_ϵ[ex.nodes[1].index] += scale_ϵ + return + end + reverse_storage_ϵ[1] = scale_ϵ + for k in 2:length(ex.nodes) + @inbounds node = ex.nodes[k] + if node.type == Nonlinear.NODE_VALUE || + node.type == Nonlinear.NODE_LOGIC || + node.type == Nonlinear.NODE_COMPARISON || + node.type == Nonlinear.NODE_PARAMETER + continue + end + parent_value = scale * ex.reverse_storage[node.parent] + if !isfinite(ex.partials_storage[k]) && iszero(parent_value) + reverse_storage_ϵ[k] = zero(ForwardDiff.Partials{N,T}) + else + reverse_storage_ϵ[k] = ForwardDiff._mul_partials( + partials_storage_ϵ[k], + reverse_storage_ϵ[node.parent], + parent_value, + ex.partials_storage[k], + ) + end + if node.type == Nonlinear.NODE_VARIABLE + @inbounds output_ϵ[node.index] += reverse_storage_ϵ[k] + elseif node.type == Nonlinear.NODE_SUBEXPRESSION + @inbounds subexpression_output[node.index] += + scale * ex.reverse_storage[k] + @inbounds subexpression_output_ϵ[node.index] += reverse_storage_ϵ[k] + end + end + return +end diff --git a/src/Nonlinear/ReverseAD/graph_tools.jl b/src/Nonlinear/ReverseAD/graph_tools.jl new file mode 100644 index 0000000000..b3c06f9ecf --- /dev/null +++ b/src/Nonlinear/ReverseAD/graph_tools.jl @@ -0,0 +1,486 @@ +""" + _replace_moi_variables( + nodes::Vector{Nonlinear.Node}, + moi_index_to_consecutive_index::Dict{MOI.VariableIndex,Int}, + ) + +Return a new `Vector{Nonlinear.Node}` where all occurances of +`NODE_MOI_VARIABLE` are replaced by `NODE_VARIABLE` that is 1-indexed and +ordered. +""" +function _replace_moi_variables( + nodes::Vector{Nonlinear.Node}, + moi_index_to_consecutive_index::Dict{MOI.VariableIndex,Int}, +) + new_nodes = Vector{Nonlinear.Node}(undef, length(nodes)) + for (i, node) in enumerate(nodes) + if node.type == Nonlinear.NODE_MOI_VARIABLE + new_nodes[i] = Nonlinear.Node( + Nonlinear.NODE_VARIABLE, + moi_index_to_consecutive_index[MOI.VariableIndex(node.index)], + node.parent, + ) + else + new_nodes[i] = node + end + end + return new_nodes +end + +@enum(Linearity, CONSTANT, LINEAR, PIECEWISE_LINEAR, NONLINEAR) + +""" + _classify_linearity( + nodes::Vector{Nonlinear.Node}, + adj::SparseArrays.SparseMatrixCSC, + subexpression_linearity::Vector{Linearity}, + ) + +Classify the nodes in a tree as constant, linear, or nonlinear with respect to +the input. +""" +function _classify_linearity( + nodes::Vector{Nonlinear.Node}, + adj::SparseArrays.SparseMatrixCSC, + subexpression_linearity::Vector{Linearity}, +) + linearity = Array{Linearity}(undef, length(nodes)) + children_arr = SparseArrays.rowvals(adj) + for k in length(nodes):-1:1 + node = nodes[k] + if node.type == Nonlinear.NODE_VARIABLE + linearity[k] = LINEAR + continue + elseif node.type == Nonlinear.NODE_VALUE + linearity[k] = CONSTANT + continue + elseif node.type == Nonlinear.NODE_PARAMETER + linearity[k] = CONSTANT + continue + elseif node.type == Nonlinear.NODE_SUBEXPRESSION + linearity[k] = subexpression_linearity[node.index] + continue + end + children_idx = SparseArrays.nzrange(adj, k) + num_constant_children, any_nonlinear = 0, false + for r in children_idx + if linearity[children_arr[r]] == NONLINEAR + any_nonlinear = true + break + elseif linearity[children_arr[r]] == CONSTANT + num_constant_children += 1 + end + end + if any_nonlinear + # If any children are nonlinear, then we're nonlinear... + linearity[k] = NONLINEAR + # ...except in the case of ifelse. If the operands are linear then + # we're piecewise linear. + op = get( + Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS, + node.index, + nothing, + ) + if ( + node.type == Nonlinear.NODE_CALL_MULTIVARIATE && + op == :ifelse && + linearity[children_arr[children_idx[2]]] == LINEAR && + linearity[children_arr[children_idx[3]]] == LINEAR + ) + linearity[k] = PIECEWISE_LINEAR + end + continue + elseif num_constant_children == length(children_idx) + # If all children are constant, then we're constant. + linearity[k] = CONSTANT + continue + end + # By this point, some children are constant and some are linear, so if + # the operator is nonlinear, then we're nonlinear. + if node.type == Nonlinear.NODE_CALL_UNIVARIATE + op = + get(Nonlinear.DEFAULT_UNIVARIATE_OPERATORS, node.index, nothing) + if op == :+ || op == :- + linearity[k] = LINEAR + else + linearity[k] = NONLINEAR + end + elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE + op = get( + Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS, + node.index, + nothing, + ) + if op == :+ + linearity[k] = LINEAR + elseif op == :- + linearity[k] = LINEAR + elseif op == :* + # Multiplication is linear if there is one non-constant term. + one_op = num_constant_children == length(children_idx) - 1 + linearity[k] = one_op ? LINEAR : NONLINEAR + elseif op == :^ + linearity[k] = NONLINEAR + elseif op == :/ + if linearity[children_arr[children_idx[2]]] == CONSTANT + # If the denominator is constant, we're linear. + linearity[k] = LINEAR + else + linearity[k] = NONLINEAR + end + elseif op == :ifelse + linearity[k] = NONLINEAR + else # User-defined functions + linearity[k] = NONLINEAR + end + elseif node.type == Nonlinear.NODE_LOGIC + linearity[k] = NONLINEAR + else + @assert node.type == Nonlinear.NODE_COMPARISON + linearity[k] = NONLINEAR + end + end + return linearity +end + +""" + _compute_gradient_sparsity!( + indices::Coloring.IndexedSet, + nodes::Vector{Nonlinear.Node}, + ) + +Compute the sparsity pattern of the gradient of an expression (i.e., a list of +which variable indices are present). +""" +function _compute_gradient_sparsity!( + indices::Coloring.IndexedSet, + nodes::Vector{Nonlinear.Node}, +) + for node in nodes + if node.type == Nonlinear.NODE_VARIABLE + push!(indices, node.index) + elseif node.type == Nonlinear.NODE_MOI_VARIABLE + error( + "Internal error: Invalid to compute sparsity if Nonlinear.NODE_MOI_VARIABLE " * + "nodes are present.", + ) + end + end + return +end + +""" + _compute_hessian_sparsity( + nodes::Vector{Nonlinear.Node}, + adj, + input_linearity::Vector{Linearity}, + indexedset::Coloring.IndexedSet, + subexpression_edgelist::Vector{Set{Tuple{Int,Int}}}, + subexpression_variables::Vector{Vector{Int}}, + ) + +Compute the sparsity pattern the Hessian of an expression. + + * `input_linearity` is the linearity with respect to the input, computed by + `_classify_linearity` + * `subexpression_edgelist` is the edge_list of each subexpression + * `subexpression_variables` is the list of all variables which appear in a + subexpression (including recursively). + +Idea: consider the (non)linearity of a node *with respect to the output*. The +children of any node which is nonlinear with respect to the output should have +nonlinear interactions, hence nonzeros in the hessian. This is not true in +general, but holds for everything we consider. + +A counter example is `f(x, y, z) = x + y * z`, but we don't have any functions +like that. By "nonlinear with respect to the output", we mean that the output +depends nonlinearly on the value of the node, regardless of how the node itself +depends on the input. +""" +function _compute_hessian_sparsity( + nodes::Vector{Nonlinear.Node}, + adj, + input_linearity::Vector{Linearity}, + indexedset::Coloring.IndexedSet, + subexpression_edgelist::Vector{Set{Tuple{Int,Int}}}, + subexpression_variables::Vector{Vector{Int}}, +) + # So start at the root of the tree and classify the linearity wrt the output. + # For each nonlinear node, do a mini DFS and collect the list of children. + # Add a nonlinear interaction between all children of a nonlinear node. + edge_list = Set{Tuple{Int,Int}}() + nonlinear_wrt_output = fill(false, length(nodes)) + children_arr = SparseArrays.rowvals(adj) + stack = Int[] + stack_ignore = Bool[] + nonlinear_group = indexedset + if length(nodes) == 1 && nodes[1].type == Nonlinear.NODE_SUBEXPRESSION + # Subexpression comes in linearly, so append edge_list + for ij in subexpression_edgelist[nodes[1].index] + push!(edge_list, ij) + end + end + for k in 2:length(nodes) + nod = nodes[k] + if nod.type == Nonlinear.NODE_MOI_VARIABLE + error( + "Internal error: Invalid to compute sparsity if Nonlinear.NODE_MOI_VARIABLE " * + "nodes are present.", + ) + end + if nonlinear_wrt_output[k] + continue # already seen this node one way or another + elseif input_linearity[k] == CONSTANT + continue # definitely not nonlinear + end + @assert !nonlinear_wrt_output[nod.parent] + # check if the parent depends nonlinearly on the value of this node + par = nodes[nod.parent] + if par.type == Nonlinear.NODE_CALL_UNIVARIATE + op = get(Nonlinear.DEFAULT_UNIVARIATE_OPERATORS, par.index, nothing) + if op === nothing || (op != :+ && op != :-) + nonlinear_wrt_output[k] = true + end + elseif par.type == Nonlinear.NODE_CALL_MULTIVARIATE + op = get( + Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS, + par.index, + nothing, + ) + if op === nothing + nonlinear_wrt_output[k] = true + elseif op in (:+, :-, :ifelse) + # pass + elseif op == :* + # check if all siblings are constant + sibling_idx = SparseArrays.nzrange(adj, nod.parent) + if !all( + i -> + input_linearity[children_arr[i]] == CONSTANT || + children_arr[i] == k, + sibling_idx, + ) + # at least one sibling isn't constant + nonlinear_wrt_output[k] = true + end + elseif op == :/ + # check if denominator is nonconstant + sibling_idx = SparseArrays.nzrange(adj, nod.parent) + if input_linearity[children_arr[last(sibling_idx)]] != CONSTANT + nonlinear_wrt_output[k] = true + end + else + nonlinear_wrt_output[k] = true + end + end + if nod.type == Nonlinear.NODE_SUBEXPRESSION && !nonlinear_wrt_output[k] + # subexpression comes in linearly, so append edge_list + for ij in subexpression_edgelist[nod.index] + push!(edge_list, ij) + end + end + if !nonlinear_wrt_output[k] + continue + end + # do a DFS from here, including all children + @assert isempty(stack) + @assert isempty(stack_ignore) + sibling_idx = SparseArrays.nzrange(adj, nod.parent) + for sidx in sibling_idx + push!(stack, children_arr[sidx]) + push!(stack_ignore, false) + end + empty!(nonlinear_group) + while length(stack) > 0 + r = pop!(stack) + should_ignore = pop!(stack_ignore) + nonlinear_wrt_output[r] = true + if nodes[r].type == Nonlinear.NODE_LOGIC || + nodes[r].type == Nonlinear.NODE_COMPARISON + # don't count the nonlinear interactions inside + # logical conditions or comparisons + should_ignore = true + end + children_idx = SparseArrays.nzrange(adj, r) + for cidx in children_idx + push!(stack, children_arr[cidx]) + push!(stack_ignore, should_ignore) + end + if should_ignore + continue + end + if nodes[r].type == Nonlinear.NODE_VARIABLE + push!(nonlinear_group, nodes[r].index) + elseif nodes[r].type == Nonlinear.NODE_SUBEXPRESSION + # append all variables in subexpression + union!(nonlinear_group, subexpression_variables[nodes[r].index]) + end + end + for i_ in 1:nonlinear_group.nnz + i = nonlinear_group.nzidx[i_] + for j_ in 1:nonlinear_group.nnz + j = nonlinear_group.nzidx[j_] + if j > i + continue # Only lower triangle. + end + push!(edge_list, (i, j)) + end + end + end + return edge_list +end + +""" + _list_subexpressions(nodes::Vector{Nonlinear.Node}) + +Returns the list of subexpressions which a given tape depends on directly +""" +function _list_subexpressions(nodes::Vector{Nonlinear.Node}) + indices = Set{Int}( + n.index for n in nodes if n.type == Nonlinear.NODE_SUBEXPRESSION + ) + return sort(collect(indices)) +end + +""" + _topological_sort( + starts::Vector{Int}, + subexpressions::Vector{Vector{Nonlinear.Node}}, + subexpression_dependency_graph::Vector{Vector{Int}} = + Vector{Vector{Int}}(undef, length(subexpressions)), + ) + +Return a topologically sorted list of the integer subexpresssion indices that +need to be computed to evaluate `subexpressions[s]` for all `s in starts`. + +`starts` should be ordered, and not contain duplicates. + +`subexpression_dependency_graph[i]` is a lazily-computed list of "out" edges +from node `i`, in terms of the integer-valued subexpression index (i.e., +`node.index`). This list should be unique and ordered. + +If calling `_topological_sort` a single time, you may omit the +`subexpression_dependency_graph` argument. + +However, if calling `_topological_sort` multiple times on the _same_ vector of +subexpresssions, you should create `subexpression_dependency_graph` once (either +as the uninitialized vector, or by explicitly computing the full +`subexpression_dependency_graph`), and pass it in. + +## Notes + +* It is important to not use recursion here, because expressions may have + arbitrary levels of nesting! +* This function assumes `subexpressions` is acyclic. +""" +function _topological_sort( + starts, + subexpressions::Vector{Vector{Nonlinear.Node}}, + subexpression_dependency_graph::Vector{Vector{Int}} = Vector{Vector{Int}}( + undef, + length(subexpressions), + ), +) + ordered = Int[] + in_order = fill(false, length(subexpressions)) + stack = Tuple{Int,Bool}[] + for s in starts + if in_order[s] + continue # s is already in `ordered`. + end + push!(stack, (s, true)) + while !isempty(stack) + node, needs_checking = pop!(stack) + if !needs_checking + # We must be returning to this node for a second time, and we + # have already checked all of the children. Therefore, we can + # add it to the set of ordered nodes. + push!(ordered, node) + in_order[node] = true + continue + elseif in_order[node] + continue # This node has already been added to `ordered`. + end + # Re-add the node to the stack, but set the `false` flag this time + # so next time we visit it, it will go on the `ordered` list + # instead. + push!(stack, (node, false)) + if !isassigned(subexpression_dependency_graph, node) + subexpression_dependency_graph[node] = + _list_subexpressions(subexpressions[node]) + end + for child in subexpression_dependency_graph[node] + if !in_order[child] + push!(stack, (child, true)) + end + end + end + end + return ordered +end + +""" + _order_subexpressions( + main_expressions::Vector{Vector{Nonlinear.Node}}, + subexpressions::Vector{Vector{Nonlinear.Node}}; + ) + +Topologically sort the subexpression needed to evaluate `main_expressions`. + +Returns two things: + + * A `Vector{Int}` containing the ordered list of subexpression-indices that + need to be evaluated to compute all `main_expressions` + * A `Vector{Vector{Int}}`, containing a list of ordered lists of + subexpression-indices that need to be evaluated to compute + `main_expressions[i]`. + +**Warning:** This doesn't handle cyclic expressions! But this should be fine +because we can't compute them in JuMP anyway. +""" +function _order_subexpressions( + main_expressions::Vector{Vector{Nonlinear.Node}}, + subexpressions::Vector{Vector{Nonlinear.Node}}, +) + # The graph of node dependencies. Constructed lazily. + subexpression_dependency_graph = + Vector{Vector{Int}}(undef, length(subexpressions)) + # Node dependencies of the main expressions. + starts = Set{Int}() + individual_sorts = Vector{Int}[] + for expression in main_expressions + s = _list_subexpressions(expression) + union!(starts, s) + push!( + individual_sorts, + _topological_sort( + s, + subexpressions, + subexpression_dependency_graph, + ), + ) + end + full_sort = _topological_sort( + starts, + subexpressions, + subexpression_dependency_graph, + ) + return full_sort, individual_sorts +end + +""" + _order_subexpressions( + main_expression::Vector{Nonlinear.Node}, + subexpressions::Vector{Vector{Nonlinear.Node}}, + ) + +Return a `Vector{Int}` containing the topologically ordered list of +subexpression-indices that need to be evaluated to compute `main_expression`. +""" +function _order_subexpressions( + main_expression::Vector{Nonlinear.Node}, + subexpressions::Vector{Vector{Nonlinear.Node}}, +) + s = _list_subexpressions(main_expression) + return _topological_sort(s, subexpressions) +end diff --git a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl new file mode 100644 index 0000000000..7adee13d3f --- /dev/null +++ b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl @@ -0,0 +1,355 @@ +function MOI.features_available(d::NLPEvaluator) + # Check if we have any user-defined multivariate operators, in which case we + # need to disable hessians. The result of features_available depends on this. + d.disable_2ndorder = + length(d.data.operators.registered_multivariate_operators) > 0 + if d.disable_2ndorder + return [:Grad, :Jac, :JacVec] + end + return [:Grad, :Jac, :JacVec, :Hess, :HessVec] +end + +function MOI.initialize(d::NLPEvaluator, requested_features::Vector{Symbol}) + # Check that we support the features requested by the user. + available_features = MOI.features_available(d) + for feature in requested_features + if !(feature in available_features) + error("Unsupported feature $feature") + end + end + moi_index_to_consecutive_index = + Dict(x => i for (i, x) in enumerate(d.ordered_variables)) + N = length(moi_index_to_consecutive_index) + # + largest_user_input_dimension = 1 + for op in d.data.operators.registered_multivariate_operators + largest_user_input_dimension = max(largest_user_input_dimension, op.N) + end + d.objective = nothing + d.user_output_buffer = zeros(largest_user_input_dimension) + d.jac_storage = zeros(max(N, largest_user_input_dimension)) + d.constraints = _FunctionStorage[] + d.last_x = fill(NaN, N) + d.want_hess = :Hess in requested_features + want_hess_storage = (:HessVec in requested_features) || d.want_hess + coloring_storage = Coloring.IndexedSet(N) + max_expr_length = 0 + # + main_expressions = [c.expression.nodes for (_, c) in d.data.constraints] + if d.data.objective !== nothing + pushfirst!(main_expressions, d.data.objective.nodes) + end + d.subexpression_order, individual_order = _order_subexpressions( + main_expressions, + [expr.nodes for expr in d.data.expressions], + ) + num_subexpressions = length(d.data.expressions) + d.subexpression_linearity = Vector{Linearity}(undef, num_subexpressions) + subexpression_variables = Vector{Vector{Int}}(undef, num_subexpressions) + subexpression_edgelist = + Vector{Set{Tuple{Int,Int}}}(undef, num_subexpressions) + d.subexpressions = Vector{_SubexpressionStorage}(undef, num_subexpressions) + d.subexpression_forward_values = zeros(num_subexpressions) + d.subexpression_reverse_values = zeros(num_subexpressions) + for k in d.subexpression_order + # Only load expressions which actually are used + d.subexpression_forward_values[k] = NaN + subex = _SubexpressionStorage( + d.data.expressions[k], + d.subexpression_linearity, + moi_index_to_consecutive_index, + ) + d.subexpressions[k] = subex + d.subexpression_linearity[k] = subex.linearity + if d.want_hess + empty!(coloring_storage) + _compute_gradient_sparsity!(coloring_storage, subex.nodes) + # union with all dependent expressions + for idx in _list_subexpressions(subex.nodes) + union!(coloring_storage, subexpression_variables[idx]) + end + subexpression_variables[k] = collect(coloring_storage) + empty!(coloring_storage) + linearity = _classify_linearity( + subex.nodes, + subex.adj, + d.subexpression_linearity, + ) + edgelist = _compute_hessian_sparsity( + subex.nodes, + subex.adj, + linearity, + coloring_storage, + subexpression_edgelist, + subexpression_variables, + ) + subexpression_edgelist[k] = edgelist + end + end + max_chunk = 1 + if d.data.objective !== nothing + d.objective = _FunctionStorage( + main_expressions[1], + d.data.objective.values, + N, + coloring_storage, + d.want_hess, + d.subexpressions, + individual_order[1], + d.subexpression_linearity, + subexpression_edgelist, + subexpression_variables, + moi_index_to_consecutive_index, + ) + max_expr_length = max(max_expr_length, length(d.objective.nodes)) + max_chunk = max(max_chunk, size(d.objective.seed_matrix, 2)) + end + for (k, (_, constraint)) in enumerate(d.data.constraints) + idx = d.data.objective !== nothing ? k + 1 : k + push!( + d.constraints, + _FunctionStorage( + main_expressions[idx], + constraint.expression.values, + N, + coloring_storage, + d.want_hess, + d.subexpressions, + individual_order[idx], + d.subexpression_linearity, + subexpression_edgelist, + subexpression_variables, + moi_index_to_consecutive_index, + ), + ) + max_expr_length = max(max_expr_length, length(d.constraints[end].nodes)) + max_chunk = max(max_chunk, size(d.constraints[end].seed_matrix, 2)) + end + # 10 is hardcoded upper bound to avoid excess memory allocation + max_chunk = min(max_chunk, 10) + if d.want_hess || want_hess_storage + d.input_ϵ = zeros(max_chunk * N) + d.output_ϵ = zeros(max_chunk * N) + # + len = max_chunk * max_expr_length + d.forward_storage_ϵ = zeros(len) + d.partials_storage_ϵ = zeros(len) + d.reverse_storage_ϵ = zeros(len) + # + len = max_chunk * length(d.subexpressions) + d.subexpression_forward_values_ϵ = zeros(len) + d.subexpression_reverse_values_ϵ = zeros(len) + # + for k in d.subexpression_order + len = max_chunk * length(d.subexpressions[k].nodes) + d.subexpressions[k].forward_storage_ϵ = zeros(len) + d.subexpressions[k].partials_storage_ϵ = zeros(len) + d.subexpressions[k].reverse_storage_ϵ = zeros(len) + end + d.max_chunk = max_chunk + if d.want_hess + _compute_hessian_lagrangian_structure(d) + end + end + return +end + +function MOI.eval_objective(d::NLPEvaluator, x) + if d.objective === nothing + error("No nonlinear objective.") + end + _reverse_mode(d, x) + return d.objective.forward_storage[1] +end + +function MOI.eval_objective_gradient(d::NLPEvaluator, g, x) + if d.objective === nothing + error("No nonlinear objective.") + end + _reverse_mode(d, x) + fill!(g, 0.0) + _extract_reverse_pass(g, d, d.objective) + return +end + +function MOI.eval_constraint(d::NLPEvaluator, g, x) + _reverse_mode(d, x) + for i in 1:length(d.constraints) + g[i] = d.constraints[i].forward_storage[1] + end + return +end + +function MOI.jacobian_structure(d::NLPEvaluator) + J = Tuple{Int64,Int64}[] + for (row, constraint) in enumerate(d.constraints) + for col in constraint.grad_sparsity + push!(J, (row, col)) + end + end + return J +end + +function MOI.eval_constraint_jacobian(d::NLPEvaluator, J, x) + _reverse_mode(d, x) + fill!(J, 0.0) + offset = 0 + for ex in d.constraints + for i in ex.grad_sparsity + d.jac_storage[i] = 0.0 + end + _extract_reverse_pass(d.jac_storage, d, ex) + for (k, idx) in enumerate(ex.grad_sparsity) + J[offset+k] = d.jac_storage[idx] + end + offset += length(ex.grad_sparsity) + end + return +end + +function MOI.eval_constraint_jacobian_product(d::NLPEvaluator, y, x, w) + fill!(y, 0.0) + J_struct = MOI.jacobian_structure(d) + nnz = length(J_struct) + J = zeros(nnz) + MOI.eval_constraint_jacobian(d, J, x) + for (k, (i, j)) in enumerate(J_struct) + y[i] += J[k] * w[j] + end + return +end + +function MOI.eval_constraint_jacobian_transpose_product( + d::NLPEvaluator, + y::AbstractVector{Float64}, + x::AbstractVector{Float64}, + w::AbstractVector{Float64}, +) + fill!(y, 0.0) + J_struct = MOI.jacobian_structure(d) + nnz = length(J_struct) + J = zeros(nnz) + MOI.eval_constraint_jacobian(d, J, x) + for (k, (i, j)) in enumerate(J_struct) + y[j] += J[k] * w[i] + end + return y +end + +function MOI.hessian_lagrangian_structure(d::NLPEvaluator) + @assert d.want_hess + return d.hessian_sparsity +end + +function _compute_hessian_lagrangian_structure(d::NLPEvaluator) + d.hessian_sparsity = Tuple{Int64,Int64}[] + if d.objective !== nothing + append!(d.hessian_sparsity, zip(d.objective.hess_I, d.objective.hess_J)) + end + for c in d.constraints + append!(d.hessian_sparsity, zip(c.hess_I, c.hess_J)) + end + return +end + +function MOI.eval_hessian_lagrangian(d::NLPEvaluator, H, x, σ, μ) + @assert d.want_hess + _reverse_mode(d, x) + fill!(d.input_ϵ, 0.0) + offset = 0 + if d.objective !== nothing + offset += _eval_hessian(d, d.objective, H, σ, offset)::Int + end + for (i, ex) in enumerate(d.constraints) + offset += _eval_hessian(d, ex, H, μ[i], offset)::Int + end + return +end + +function MOI.eval_hessian_lagrangian_product(d::NLPEvaluator, h, x, v, σ, μ) + _reverse_mode(d, x) + fill!(h, 0.0) + T = ForwardDiff.Partials{1,Float64} + input_ϵ = reinterpret(T, d.input_ϵ) + output_ϵ = reinterpret(T, d.output_ϵ) + for i in 1:length(x) + input_ϵ[i] = ForwardDiff.Partials((v[i],)) + end + # forward evaluate all subexpressions once + subexpr_forward_values_ϵ = reinterpret(T, d.subexpression_forward_values_ϵ) + for i in d.subexpression_order + subexpr = d.subexpressions[i] + subexpr_forward_values_ϵ[i] = _forward_eval_ϵ( + subexpr, + reinterpret(T, subexpr.forward_storage_ϵ), + reinterpret(T, subexpr.partials_storage_ϵ), + input_ϵ, + subexpr_forward_values_ϵ, + d.data.operators, + ) + end + # we only need to do one reverse pass through the subexpressions as well + subexpr_reverse_values_ϵ = reinterpret(T, d.subexpression_reverse_values_ϵ) + fill!(subexpr_reverse_values_ϵ, zero(T)) + fill!(d.subexpression_reverse_values, 0.0) + fill!(d.reverse_storage_ϵ, 0.0) + fill!(output_ϵ, zero(T)) + if d.objective !== nothing + _forward_eval_ϵ( + d.objective, + reinterpret(T, d.forward_storage_ϵ), + reinterpret(T, d.partials_storage_ϵ), + input_ϵ, + subexpr_forward_values_ϵ, + d.data.operators, + ) + _reverse_eval_ϵ( + output_ϵ, + d.objective, + reinterpret(T, d.reverse_storage_ϵ), + reinterpret(T, d.partials_storage_ϵ), + d.subexpression_reverse_values, + subexpr_reverse_values_ϵ, + σ, + zero(T), + ) + end + for (i, con) in enumerate(d.constraints) + _forward_eval_ϵ( + con, + reinterpret(T, d.forward_storage_ϵ), + reinterpret(T, d.partials_storage_ϵ), + input_ϵ, + subexpr_forward_values_ϵ, + d.data.operators, + ) + _reverse_eval_ϵ( + output_ϵ, + con, + reinterpret(T, d.reverse_storage_ϵ), + reinterpret(T, d.partials_storage_ϵ), + d.subexpression_reverse_values, + subexpr_reverse_values_ϵ, + μ[i], + zero(T), + ) + end + for i in length(d.subexpression_order):-1:1 + j = d.subexpression_order[i] + subexpr = d.subexpressions[j] + _reverse_eval_ϵ( + output_ϵ, + subexpr, + reinterpret(T, subexpr.reverse_storage_ϵ), + reinterpret(T, subexpr.partials_storage_ϵ), + d.subexpression_reverse_values, + subexpr_reverse_values_ϵ, + d.subexpression_reverse_values[j], + subexpr_reverse_values_ϵ[j], + ) + end + for i in 1:length(x) + h[i] += output_ϵ[i].values[1] + end + return +end diff --git a/src/Nonlinear/ReverseAD/precompile.jl b/src/Nonlinear/ReverseAD/precompile.jl new file mode 100644 index 0000000000..16915364f8 --- /dev/null +++ b/src/Nonlinear/ReverseAD/precompile.jl @@ -0,0 +1,46 @@ +function _precompile_() + ccall(:jl_generating_output, Cint, ()) == 1 || return nothing + Base.precompile( + Tuple{ + typeof(MOI.eval_hessian_lagrangian), + NLPEvaluator, + SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true}, + Vector{Float64}, + Float64, + SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true}, + }, + ) + Base.precompile( + Tuple{ + typeof(MOI.eval_objective_gradient), + NLPEvaluator, + Vector{Float64}, + Vector{Float64}, + }, + ) + Base.precompile(Tuple{typeof(MOI.initialize),NLPEvaluator,Vector{Symbol}}) + Base.precompile(Tuple{typeof(MOI.features_available),NLPEvaluator}) + Base.precompile( + Tuple{ + typeof(MOI.eval_constraint_jacobian), + NLPEvaluator, + SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true}, + Vector{Float64}, + }, + ) + Base.precompile( + Tuple{ + typeof(MOI.eval_constraint), + NLPEvaluator, + SubArray{Float64,1,Vector{Float64},Tuple{UnitRange{Int64}},true}, + Vector{Float64}, + }, + ) + Base.precompile(Tuple{typeof(MOI.jacobian_structure),NLPEvaluator}) + Base.precompile( + Tuple{typeof(MOI.eval_objective),NLPEvaluator,Vector{Float64}}, + ) + return +end + +_precompile_() diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl new file mode 100644 index 0000000000..4c09c8b11e --- /dev/null +++ b/src/Nonlinear/ReverseAD/reverse_mode.jl @@ -0,0 +1,259 @@ +""" + _reverse_mode(d::NLPEvaluator, x) + +Run reverse-mode automatic differentiation on `d` given the primal solution `x`. + +This function updates many of the data-structures inside `d` in-place. + +At a high level, reverse-mode AD has two phases: + +In Phase I, we evalute the problem in `d` at the primal solution `x`, and +stores the primal solution of each expression in the tree and the first-order +partial derivative information for each node with respect to its arguments. + +Because the nodes in our data structure are topologically sorted, we can make a +single pass through the tree by iterating backwards through the vector of stored +nodes. + +In Phase II, we propagate the partial derivative information back down the tree +to find the derivative of each function with respect to the input. + +Because the nodes in our data structure are topologically sorted, we can make a +single pass through the tree by iterating forwards through the vector of stored +nodes. +""" +function _reverse_mode(d::NLPEvaluator, x) + if d.last_x == x + # Fail fast if the primal solution has not changed since last call. + return + end + # Phase I + for k in d.subexpression_order + d.subexpression_forward_values[k] = + _forward_eval(d.subexpressions[k], d, x) + end + if d.objective !== nothing + _forward_eval(d.objective::_FunctionStorage, d, x) + end + for con in d.constraints + _forward_eval(con, d, x) + end + # Phase II + for k in d.subexpression_order + _reverse_eval(d.subexpressions[k]) + end + if d.objective !== nothing + _reverse_eval(d.objective::_FunctionStorage) + end + for con in d.constraints + _reverse_eval(con) + end + copyto!(d.last_x, x) + return +end + +""" + _forward_eval( + f::Union{_FunctionStorage,_SubexpressionStorage}, + d::NLPEvaluator, + x::AbstractVector{T}, + ) where {T} + +Forward-mode evaluation of an expression tree given in `f`. + + * This function assumes that the values of all sub-expressions have already + been computed and are stored in `d.subexpression_forward_values`. + * `f.partials_storage[k]` is the partial derivative of `nodes[k].parent` with + respect to the value of node `k`. It's efficient to compute this at the same + time as the value of the parent because we use it in reverse mode and in dual + forward mode. Note that `partials_storage`` makes a subtle assumption that we + have a tree instead of a general DAG. If we have a DAG, then need to + associate storage with each edge of the DAG. +""" +function _forward_eval( + # !!! warning + # This Union depends upon _FunctionStorage and _SubexpressionStorage + # having similarly named fields. + f::Union{_FunctionStorage,_SubexpressionStorage}, + d::NLPEvaluator, + x::AbstractVector{T}, +)::T where {T} + @assert length(f.forward_storage) >= length(f.nodes) + @assert length(f.partials_storage) >= length(f.nodes) + operators = d.data.operators + # f.nodes is already in order such that parents always appear before + # children, so a backwards pass through f.nodes is a forward pass through + # the tree. + children_arr = SparseArrays.rowvals(f.adj) + for k in length(f.nodes):-1:1 + node = f.nodes[k] + f.partials_storage[k] = zero(T) + if node.type == Nonlinear.NODE_VARIABLE + f.forward_storage[k] = x[node.index] + # This should never happen, because we will have replaced these by now. + # elseif node.type == Nonlinear.NODE_MOI_VARIABLE + # f.forward_storage[k] = x[node.index] + elseif node.type == Nonlinear.NODE_VALUE + f.forward_storage[k] = f.const_values[node.index] + elseif node.type == Nonlinear.NODE_SUBEXPRESSION + f.forward_storage[k] = d.subexpression_forward_values[node.index] + elseif node.type == Nonlinear.NODE_PARAMETER + f.forward_storage[k] = d.data.parameters[node.index] + elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE + children_indices = SparseArrays.nzrange(f.adj, k) + N = length(children_indices) + f_input = _UnsafeVectorView(d.jac_storage, N) + ∇f = _UnsafeVectorView(d.user_output_buffer, N) + for (r, i) in enumerate(children_indices) + f_input[r] = f.forward_storage[children_arr[i]] + ∇f[r] = 0.0 + end + f.forward_storage[k] = Nonlinear.eval_multivariate_function( + operators, + operators.multivariate_operators[node.index], + f_input, + ) + Nonlinear.eval_multivariate_gradient( + operators, + operators.multivariate_operators[node.index], + ∇f, + f_input, + ) + for (r, i) in enumerate(children_indices) + f.partials_storage[children_arr[i]] = ∇f[r] + end + elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE + child_idx = children_arr[f.adj.colptr[k]] + f.forward_storage[k] = Nonlinear.eval_univariate_function( + operators, + operators.univariate_operators[node.index], + f.forward_storage[child_idx], + ) + f.partials_storage[child_idx] = Nonlinear.eval_univariate_gradient( + operators, + operators.univariate_operators[node.index], + f.forward_storage[child_idx], + ) + elseif node.type == Nonlinear.NODE_COMPARISON + children_idx = SparseArrays.nzrange(f.adj, k) + result = true + f.partials_storage[children_arr[children_idx[1]]] = zero(T) + for r in 2:length(children_idx) + lhs = children_arr[children_idx[r-1]] + rhs = children_arr[children_idx[r]] + result &= Nonlinear.eval_comparison_function( + operators, + operators.comparison_operators[node.index], + f.forward_storage[lhs], + f.forward_storage[rhs], + ) + f.partials_storage[rhs] = zero(T) + end + f.forward_storage[k] = result + else + @assert node.type == Nonlinear.NODE_LOGIC + children_idx = SparseArrays.nzrange(f.adj, k) + lhs = children_arr[children_idx[1]] + rhs = children_arr[children_idx[2]] + f.forward_storage[k] = Nonlinear.eval_logic_function( + operators, + operators.logic_operators[node.index], + f.forward_storage[lhs] == 1, + f.forward_storage[rhs] == 1, + ) + f.partials_storage[lhs] = zero(T) + f.partials_storage[rhs] = zero(T) + end + end + return f.forward_storage[1] +end + +""" + _reverse_eval(f::Union{_FunctionStorage,_SubexpressionStorage}) + +Reverse-mode evaluation of an expression tree given in `f`. + + * This function assumes `f.partials_storage` is already updated. + * This function assumes that `f.reverse_storage` has been initialized with 0.0. +""" +function _reverse_eval( + # !!! warning + # This Union depends upon _FunctionStorage and _SubexpressionStorage + # having similarly named fields. + f::Union{_FunctionStorage,_SubexpressionStorage}, +) + @assert length(f.reverse_storage) >= length(f.nodes) + @assert length(f.partials_storage) >= length(f.nodes) + # f.nodes is already in order such that parents always appear before + # children so a forward pass through nodes is a backwards pass through the + # tree. + f.reverse_storage[1] = one(Float64) + for k in 2:length(f.nodes) + node = f.nodes[k] + if node.type == Nonlinear.NODE_VALUE || + node.type == Nonlinear.NODE_LOGIC || + node.type == Nonlinear.NODE_COMPARISON || + node.type == Nonlinear.NODE_PARAMETER + continue + end + rev_parent = f.reverse_storage[node.parent] + partial = f.partials_storage[k] + f.reverse_storage[k] = ifelse( + rev_parent == 0.0 && !isfinite(partial), + rev_parent, + rev_parent * partial, + ) + end + return +end + +""" + _extract_reverse_pass( + g::Vector{T}, + d::NLPEvaluator, + f::Union{_FunctionStorage,_SubexpressionStorage}, + ) where {T} + +Fill the gradient vector `g` with the values from the reverse pass. Assumes you +have already called `_reverse_eval_all(d, x)`. +""" +function _extract_reverse_pass( + g::Vector{T}, + d::NLPEvaluator, + f::Union{_FunctionStorage,_SubexpressionStorage}, +) where {T} + for i in f.dependent_subexpressions + d.subexpression_reverse_values[i] = 0.0 + end + _extract_reverse_pass_inner(g, f, d.subexpression_reverse_values, 1.0) + for i in length(f.dependent_subexpressions):-1:1 + k = f.dependent_subexpressions[i] + _extract_reverse_pass_inner( + g, + d.subexpressions[k], + d.subexpression_reverse_values, + d.subexpression_reverse_values[k], + ) + end + return +end + +function _extract_reverse_pass_inner( + output::AbstractVector{T}, + # !!! warning + # This Union depends upon _FunctionStorage and _SubexpressionStorage + # having similarly named fields. + f::Union{_FunctionStorage,_SubexpressionStorage}, + subexpressions::AbstractVector{T}, + scale::T, +) where {T} + @assert length(f.reverse_storage) >= length(f.nodes) + for (k, node) in enumerate(f.nodes) + if node.type == Nonlinear.NODE_VARIABLE + output[node.index] += scale * f.reverse_storage[k] + elseif node.type == Nonlinear.NODE_SUBEXPRESSION + subexpressions[node.index] += scale * f.reverse_storage[k] + end + end + return +end diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl new file mode 100644 index 0000000000..348af23859 --- /dev/null +++ b/src/Nonlinear/ReverseAD/types.jl @@ -0,0 +1,188 @@ +mutable struct _SubexpressionStorage + nodes::Vector{Nonlinear.Node} + adj::SparseArrays.SparseMatrixCSC{Bool,Int} + const_values::Vector{Float64} + forward_storage::Vector{Float64} + partials_storage::Vector{Float64} + reverse_storage::Vector{Float64} + forward_storage_ϵ::Vector{Float64} + partials_storage_ϵ::Vector{Float64} + reverse_storage_ϵ::Vector{Float64} + linearity::Linearity + + function _SubexpressionStorage( + expr::Nonlinear.NonlinearExpression, + subexpression_linearity, + moi_index_to_consecutive_index, + ) + nodes = + _replace_moi_variables(expr.nodes, moi_index_to_consecutive_index) + adj = Nonlinear.adjacency_matrix(nodes) + N = length(nodes) + linearity = _classify_linearity(nodes, adj, subexpression_linearity) + return new( + nodes, + adj, + expr.values, + zeros(N), # forward_storage, + zeros(N), # partials_storage, + zeros(N), # reverse_storage, + Float64[], + Float64[], + Float64[], + linearity[1], + ) + end +end + +mutable struct _FunctionStorage + nodes::Vector{Nonlinear.Node} + adj::SparseArrays.SparseMatrixCSC{Bool,Int} + const_values::Vector{Float64} + forward_storage::Vector{Float64} + partials_storage::Vector{Float64} + reverse_storage::Vector{Float64} + grad_sparsity::Vector{Int} + # Nonzero pattern of Hessian matrix + hess_I::Vector{Int} + hess_J::Vector{Int} + rinfo::Coloring.RecoveryInfo # coloring info for hessians + seed_matrix::Matrix{Float64} + linearity::Linearity + # subexpressions which this function depends on, ordered for forward pass. + dependent_subexpressions::Vector{Int} + + function _FunctionStorage( + nodes::Vector{Nonlinear.Node}, + const_values, + num_variables, + coloring_storage::Coloring.IndexedSet, + want_hess::Bool, + subexpressions::Vector{_SubexpressionStorage}, + dependent_subexpressions, + subexpression_linearity, + subexpression_edgelist, + subexpression_variables, + moi_index_to_consecutive_index, + ) + nodes = _replace_moi_variables(nodes, moi_index_to_consecutive_index) + adj = Nonlinear.adjacency_matrix(nodes) + N = length(nodes) + empty!(coloring_storage) + _compute_gradient_sparsity!(coloring_storage, nodes) + for k in dependent_subexpressions + _compute_gradient_sparsity!( + coloring_storage, + subexpressions[k].nodes, + ) + end + grad_sparsity = sort!(collect(coloring_storage)) + empty!(coloring_storage) + if want_hess + linearity = _classify_linearity(nodes, adj, subexpression_linearity) + edgelist = _compute_hessian_sparsity( + nodes, + adj, + linearity, + coloring_storage, + subexpression_edgelist, + subexpression_variables, + ) + hess_I, hess_J, rinfo = Coloring.hessian_color_preprocess( + edgelist, + num_variables, + coloring_storage, + ) + seed_matrix = Coloring.seed_matrix(rinfo) + return new( + nodes, + adj, + const_values, + zeros(N), # forward_storage, + zeros(N), # partials_storage, + zeros(N), # reverse_storage, + grad_sparsity, + hess_I, + hess_J, + rinfo, + seed_matrix, + linearity[1], + dependent_subexpressions, + ) + else + return new( + nodes, + adj, + const_values, + zeros(N), # forward_storage, + zeros(N), # partials_storage, + zeros(N), # reverse_storage, + grad_sparsity, + Int[], + Int[], + Coloring.RecoveryInfo(), + Array{Float64}(undef, 0, 0), + NONLINEAR, + dependent_subexpressions, + ) + end + end +end + +""" + NLPEvaluator( + data::Nonlinear.NonlinearData, + ordered_variables::Vector{MOI.VariableIndex}, + ) + +Return an `NLPEvaluator` object that implements the `MOI.AbstractNLPEvaluator` +interface. + +!!! warning + Before using, you must initialize the evaluator using `MOI.initialize`. +""" +mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator + data::Nonlinear.NonlinearData + ordered_variables::Vector{MOI.VariableIndex} + + objective::Union{Nothing,_FunctionStorage} + constraints::Vector{_FunctionStorage} + subexpressions::Vector{_SubexpressionStorage} + subexpression_order::Vector{Int} + # Storage for the subexpressions in reverse-mode automatic differentiation. + subexpression_forward_values::Vector{Float64} + subexpression_reverse_values::Vector{Float64} + subexpression_linearity::Vector{Linearity} + + # A cache of the last x. This is used to guide whether we need to re-run + # reverse-mode automatic differentiation. + last_x::Vector{Float64} + + # Temporary storage for computing jacobians. This is also used as temporary + # storage for the input of multivariate functions. + jac_storage::Vector{Float64} + # Temporary storage for the gradient of multivariate functions + user_output_buffer::Vector{Float64} + + # storage for computing hessians + # these Float64 vectors are reinterpreted to hold multiple epsilon components + # so the length should be multiplied by the maximum number of epsilon components + disable_2ndorder::Bool # don't offer Hess or HessVec + want_hess::Bool + forward_storage_ϵ::Vector{Float64} # (longest expression) + partials_storage_ϵ::Vector{Float64} # (longest expression) + reverse_storage_ϵ::Vector{Float64} # (longest expression) + input_ϵ::Vector{Float64} # (number of variables) + output_ϵ::Vector{Float64}# (number of variables) + subexpression_forward_values_ϵ::Vector{Float64} # (number of subexpressions) + subexpression_reverse_values_ϵ::Vector{Float64} # (number of subexpressions) + hessian_sparsity::Vector{Tuple{Int64,Int64}} + max_chunk::Int # chunk size for which we've allocated storage + + function NLPEvaluator( + data::Nonlinear.NonlinearData, + ordered_variables::Vector{MOI.VariableIndex}, + ) + return new(data, ordered_variables) + end +end diff --git a/src/Nonlinear/ReverseAD/utils.jl b/src/Nonlinear/ReverseAD/utils.jl new file mode 100644 index 0000000000..ca3ae6432c --- /dev/null +++ b/src/Nonlinear/ReverseAD/utils.jl @@ -0,0 +1,63 @@ +""" + _UnsafeVectorView(offset, len, ptr) + +Lightweight unsafe view for vectors. + +## Motivation + +`_UnsafeVectorView` is needed as an allocation-free equivalent of `view`. Other +alternatives, like `view(x, 1:len)` or a struct like +``` +struct _SafeView{T} + x::Vector{T} + len::Int +end +``` +will allocate so that `x` can be tracked by Julia's GC. + +`_UnsafeVectorView` relies on the fact that the use-cases of `_UnsafeVectorView` +only temporarily wrap a long-lived vector like `d.jac_storage` so that we don't +have to worry about the GC removing `d.jac_storage` while `_UnsafeVectorView` +exists. This lets us use a `Ptr{T}` and create a struct that is `isbitstype` and +therefore does not allocate. + +## Example + +Instead of `view(x, 1:10)`, use `_UnsafeVectorView(0, 10, pointer(x))`. + +## Unsafe behavior + +`_UnsafeVectorView` is unsafe because it assumes that the vector `x` that the +pointer `ptr` refers to remains valid during the usage of `_UnsafeVectorView`. +""" +struct _UnsafeVectorView{T} <: DenseVector{T} + offset::Int + len::Int + ptr::Ptr{T} +end + +Base.getindex(x::_UnsafeVectorView, i) = unsafe_load(x.ptr, i + x.offset) + +function Base.setindex!(x::_UnsafeVectorView, value, i) + unsafe_store!(x.ptr, value, i + x.offset) + return value +end + +Base.length(v::_UnsafeVectorView) = v.len + +Base.size(v::_UnsafeVectorView) = (v.len,) + +function _UnsafeVectorView(x::Vector, N::Int) + if length(x) < N + resize!(x, N) + end + return _UnsafeVectorView(0, N, pointer(x)) +end + +function _reinterpret_unsafe(::Type{T}, x::Vector{R}) where {T,R} + # how many T's fit into x? + @assert isbitstype(T) && isbitstype(R) + len = length(x) * sizeof(R) + p = reinterpret(Ptr{T}, pointer(x)) + return _UnsafeVectorView(0, div(len, sizeof(T)), p) +end diff --git a/src/Nonlinear/types.jl b/src/Nonlinear/types.jl index 1564036988..24c056a0ef 100644 --- a/src/Nonlinear/types.jl +++ b/src/Nonlinear/types.jl @@ -263,3 +263,21 @@ struct ExprGraphOnly <: AbstractAutomaticDifferentiation end function Evaluator(model::Model, ::ExprGraphOnly, ::Vector{MOI.VariableIndex}) return Evaluator(model) end + +""" + SparseReverseMode() <: AbstractAutomaticDifferentiation + +An implementation of `AbstractAutomaticDifferentiation` that uses sparse +reverse-mode automatic differentiation to compute derivatives. Supports all +features in the MOI nonlinear interface. +""" +struct SparseReverseMode <: AbstractAutomaticDifferentiation end + +function set_differentiation_backend( + data::NonlinearData, + ::SparseReverseMode, + ordered_variables::Vector{MOI.VariableIndex}, +) + data.inner = ReverseAD.NLPEvaluator(data, ordered_variables) + return +end diff --git a/test/Nonlinear/Nonlinear.jl b/test/Nonlinear/Nonlinear.jl index 842e99750e..7f2ca8b595 100644 --- a/test/Nonlinear/Nonlinear.jl +++ b/test/Nonlinear/Nonlinear.jl @@ -845,3 +845,5 @@ end end TestNonlinear.runtests() + +include("ReverseAD.jl") diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl new file mode 100644 index 0000000000..1ce0346cc3 --- /dev/null +++ b/test/Nonlinear/ReverseAD.jl @@ -0,0 +1,803 @@ +module TestReverseAD + +using Test +import MathOptInterface + +const MOI = MathOptInterface +const Nonlinear = MOI.Nonlinear +const ReverseAD = Nonlinear.ReverseAD +const Coloring = ReverseAD.Coloring + +function runtests() + for name in names(@__MODULE__; all = true) + if startswith("$(name)", "test_") + @testset "$(name)" begin + getfield(@__MODULE__, name)() + end + end + end + return +end + +function test_objective_quadratic_univariate() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :($x^2 + 1)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.2]) == 1.2^2 + 1 + g = [NaN] + MOI.eval_objective_gradient(data, g, [1.2]) + @test g == [2.4] + @test MOI.hessian_lagrangian_structure(data) == [(1, 1)] + H = [NaN] + MOI.eval_hessian_lagrangian(data, H, [1.2], 1.5, Float64[]) + @test H == 1.5 .* [2.0] + MOI.eval_hessian_lagrangian_product(data, H, [1.2], [1.2], 1.5, Float64[]) + @test H == [1.5 * 2.0 * 1.2] + return +end + +function test_objective_quadratic_multivariate() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :($x^2 + $x * $y + $y^2)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 + g = [NaN, NaN] + MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3] + @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + H = [NaN, NaN, NaN] + MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[]) + @test H == 1.5 .* [2.0, 2.0, 1.0] + v = [0.3, 0.4] + hv = [NaN, NaN] + MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[]) + @test hv ≈ 1.5 .* [2 1; 1 2] * v + return +end + +function test_objective_quadratic_multivariate_subexpressions() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + ex = Nonlinear.add_expression(data, :($x^2)) + ey = Nonlinear.add_expression(data, :($y^2)) + exy = Nonlinear.add_expression(data, :($ex + $x * $y)) + Nonlinear.set_objective(data, :($exy + $ey)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 + g = [NaN, NaN] + MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3] + @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + H = [NaN, NaN, NaN] + MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[]) + @test H == 1.5 .* [2.0, 2.0, 1.0] + v = [0.3, 0.4] + hv = [NaN, NaN] + MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[]) + @test hv ≈ 1.5 .* [2 1; 1 2] * v + return +end + +function test_objective_ifelse_comparison() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(ifelse(1 <= $x <= 2, $x^2, $y^2))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2 + g = [NaN, NaN] + MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + @test g == [2 * 1.2, 0.0] + MOI.eval_objective_gradient(data, g, [2.2, 2.3]) + @test g == [0.0, 2 * 2.3] + return +end + +function test_objective_ifelse_logic() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2 + g = [NaN, NaN] + MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + @test g == [2 * 1.2, 0.0] + MOI.eval_objective_gradient(data, g, [2.2, 2.3]) + @test g == [0.0, 2 * 2.3] + return +end + +function test_objective_parameter() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + p = Nonlinear.add_parameter(data, 1.2) + Nonlinear.set_objective(data, :($p * $x)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(data, [1.3]) == 1.2 * 1.3 + g = [NaN] + MOI.eval_objective_gradient(data, g, [1.3]) + @test g == [1.2] + return +end + +function test_objective_subexpression() + data = Nonlinear.NonlinearData() + x = MOI.VariableIndex(1) + input = :($x^2 + 1) + expr = Nonlinear.add_expression(data, input) + expr_2 = Nonlinear.add_expression(data, :($expr^2)) + Nonlinear.set_objective(data, :($expr_2^2)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + MOI.initialize(data, [:Grad]) + @test MOI.eval_objective(data, [1.3]) == ((1.3^2 + 1)^2)^2 + g = [NaN] + MOI.eval_objective_gradient(data, g, [1.3]) + @test g ≈ [2 * (1.3^2 + 1)^2 * (2 * (1.3^2 + 1)) * 2 * 1.3] + return +end + +function test_constraint_quadratic_univariate() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.add_constraint(data, :($x^2 <= 2.0)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + g = [NaN] + x_val = [1.2] + MOI.eval_constraint(data, g, x_val) + @test g == x_val .^ 2 .- 2 + @test MOI.jacobian_structure(data) == [(1, 1)] + J = [NaN] + MOI.eval_constraint_jacobian(data, J, x_val) + @test J == 2 .* x_val + @test MOI.hessian_lagrangian_structure(data) == [(1, 1)] + H = [NaN] + MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + @test H == 1.5 .* [2.0] + return +end + +function test_constraint_quadratic_multivariate() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.add_constraint(data, :($x^2 + $x * $y + $y^2 <= 2.0)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + g = [NaN] + x_val = [1.2, 2.3] + MOI.eval_constraint(data, g, x_val) + @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 + @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)] + J = [NaN, NaN] + MOI.eval_constraint_jacobian(data, J, x_val) + @test J == [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]] + @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + H = [NaN, NaN, NaN] + MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + @test H == 1.5 .* [2.0, 2.0, 1.0] + return +end + +function test_constraint_quadratic_multivariate_subexpressions() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + ex = Nonlinear.add_expression(data, :($x^2)) + ey = Nonlinear.add_expression(data, :($y^2)) + exy = Nonlinear.add_expression(data, :($ex + $x * $y)) + Nonlinear.add_constraint(data, :($exy + $ey <= 2.0)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + g = [NaN] + x_val = [1.2, 2.3] + MOI.eval_constraint(data, g, x_val) + @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 + # Jacobian + @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)] + J = [NaN, NaN] + MOI.eval_constraint_jacobian(data, J, x_val) + @test J ≈ [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]] + # Jv + y, w = [NaN], [1.1, 2.2] + MOI.eval_constraint_jacobian_product(data, y, x_val, w) + @test y ≈ [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] * w + # v'J + y, w = [NaN, NaN], [1.1] + MOI.eval_constraint_jacobian_transpose_product(data, y, x_val, w) + wJ = w' * [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] + @test y ≈ wJ[:] + # Hessian-lagrangian + @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + H = [NaN, NaN, NaN] + MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + @test H ≈ 1.5 .* [2.0, 2.0, 1.0] + return +end + +function test_hessian_sparsity_registered_function() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + z = MOI.VariableIndex(3) + f(x, y) = x^2 + y^2 + function ∇f(g, x...) + g[1] = 2x[1] + g[2] = 2x[2] + return + end + function ∇²f(H, x...) + H[1, 1] = 2 + H[2, 2] = 2 + return + end + data = Nonlinear.NonlinearData() + Nonlinear.register_operator(data, :f, 2, f, ∇f, ∇²f) + Nonlinear.set_objective(data, :(f($x, $z) + $y^2)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y, z], + ) + @test_broken :Hess in MOI.features_available(data) + # MOI.initialize(data, [:Grad, :Jac, :Hess]) + # @test MOI.hessian_lagrangian_structure(data) == + # [(1, 1), (2, 2), (3, 3), (3, 1)] + # H = fill(NaN, 4) + # MOI.eval_hessian_lagrangian(data, H, rand(3), 1.5, Float64[]) + # @test H == 1.5 .* [2.0, 2.0, 2.0, 0.0] + return +end + +function test_hessian_sparsity_registered_rosenbrock() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + f(x...) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2 + function ∇f(g, x...) + g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2 + g[2] = 200 * (x[2] - x[1]^2) + return + end + function ∇²f(H, x...) + H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2 + H[2, 1] = -400 * x[1] + H[2, 2] = 200.0 + return + end + data = Nonlinear.NonlinearData() + Nonlinear.register_operator(data, :rosenbrock, 2, f, ∇f, ∇²f) + Nonlinear.set_objective(data, :(rosenbrock($x, $y))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x, y], + ) + @test_broken :Hess in MOI.features_available(data) + # MOI.initialize(data, [:Grad, :Jac, :Hess]) + # @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + # H = fill(NaN, 3) + # MOI.eval_hessian_lagrangian(data, H, [1.0, 1.0], 1.5, Float64[]) + # @test H == 1.5 .* [802, 200, -400] + return +end + +struct _ColoringGraph + num_vertices::Int + edges::Vector{Tuple{Int,Int}} +end + +function to_adjlist(graph::_ColoringGraph) + I = [i for (i, _) in graph.edges] + J = [j for (_, j) in graph.edges] + return Coloring.UndirectedGraph(I, J, graph.num_vertices) +end + +function test_coloring_edge_free_graph() + graph = _ColoringGraph(10, []) + _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 1 + return +end + +function test_coloring_one_edge_graph() + graph = _ColoringGraph(10, [(2, 4)]) + color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 2 + @test color[2] != color[4] + return +end + +function test_coloring_two_edge_graph() + graph = _ColoringGraph(10, [(2, 4), (2, 3)]) + color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 2 + @test color[3] == color[4] + return +end + +function test_coloring_three_edge_graph() + graph = _ColoringGraph(10, [(2, 4), (2, 3), (3, 4)]) + color, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 3 + # TODO: What is this testing? + Coloring.recovery_preprocess(to_adjlist(graph), color, numcolors, Int[]) + return +end + +function test_coloring_two_edge_three_vertex_graph() + graph = _ColoringGraph(3, [(1, 3), (2, 3)]) + _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 2 + return +end + +function test_coloring_four_edge_four_vertex_graph() + graph = _ColoringGraph(4, [(1, 2), (2, 3), (3, 4), (4, 1)]) + _, numcolors = Coloring.acyclic_coloring(to_adjlist(graph)) + @test numcolors == 3 + return +end + +function test_coloring_topological_sort() + # graph = _ColoringGraph(6, [(1, 2), (1, 3), (1, 6), (2, 4), (2, 5)]) + vec = [3, 6, 2, 1, 4, 5, 1, 2, 2, 1] + offset = [1, 4, 7, 8, 9, 10, 11] + v = Coloring.reverse_topological_sort_by_dfs(vec, offset, 6, zeros(Int, 6)) + @test v[1] == [3, 6, 4, 5, 2, 1] + @test v[2] == [0, 1, 1, 2, 2, 1] + return +end + +function test_coloring_end_to_end_hessian_coloring_and_recovery() + I, J, rinfo = Coloring.hessian_color_preprocess(Set([(1, 2)]), 2) + R = Coloring.seed_matrix(rinfo) + Coloring.prepare_seed_matrix!(R, rinfo) + @test I == [1, 2, 2] + @test J == [1, 2, 1] + @test R == [1.0 0.0; 0.0 1.0] + hess = [3.4 2.1; 2.1 1.3] + matmat = hess * R + V = zeros(3) + Coloring.recover_from_matmat!(V, matmat, rinfo, zeros(3)) + @test V == [3.4, 1.3, 2.1] + return +end + +function test_derivatives() + a = MOI.VariableIndex(1) + b = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(sin($a^2) + cos($b * 4) / 5 - 2.0)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [a, b], + ) + MOI.initialize(data, [:Grad, :Jac, :Hess]) + x = [2.0, 3.0] + @test MOI.eval_objective(data, x) == sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0 + g = [NaN, NaN] + MOI.eval_objective_gradient(data, g, x) + @test g ≈ [2 * x[1] * cos(x[1]^2), -4 * sin(x[2] * 4) / 5] + @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2)] + H = [NaN, NaN] + MOI.eval_hessian_lagrangian(data, H, x, 1.5, Float64[]) + H_exact = [ + -4 * x[1]^2 * sin(x[1]^2) + 2 * cos(x[1]^2), + -4 / 5 * 4 * cos(x[2] * 4), + ] + @test H == 1.5 .* H_exact + return +end + +function test_NLPBlockData() + data = Nonlinear.NonlinearData() + x = MOI.VariableIndex(1) + Nonlinear.add_constraint(data, :($x <= 1)) + Nonlinear.add_constraint(data, :($x >= 2)) + Nonlinear.add_constraint(data, :($x == 3)) + Nonlinear.add_constraint(data, :(4 <= $x <= 5)) + block = MOI.NLPBlockData(data, [x]) + @test block.constraint_bounds == [ + MOI.NLPBoundsPair(-Inf, 0.0), + MOI.NLPBoundsPair(0.0, Inf), + MOI.NLPBoundsPair(0.0, 0.0), + MOI.NLPBoundsPair(4.0, 5.0), + ] + @test block.has_objective == false + return +end + +function test_linearity() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + z = MOI.VariableIndex(3) + variables = Dict(x => 1, y => 2, z => 3) + function _test_linearity(input, test_value, IJ = [], indices = []) + data = Nonlinear.NonlinearData() + ex = Nonlinear.add_expression(data, input) + expr = data[ex] + adj = Nonlinear.adjacency_matrix(expr.nodes) + nodes = ReverseAD._replace_moi_variables(expr.nodes, variables) + ret = ReverseAD._classify_linearity(nodes, adj, ReverseAD.Linearity[]) + @test ret[1] == test_value + indexed_set = Coloring.IndexedSet(100) + edge_list = ReverseAD._compute_hessian_sparsity( + nodes, + adj, + ret, + indexed_set, + Set{Tuple{Int,Int}}[], + Vector{Int}[], + ) + if ret[1] != ReverseAD.NONLINEAR + @test length(edge_list) == 0 + elseif length(IJ) > 0 + @test IJ == edge_list + end + if length(indices) > 0 + empty!(indexed_set) + ReverseAD._compute_gradient_sparsity!(indexed_set, nodes) + ix = sort(collect(indexed_set)) + @test indices == ix + end + return + end + _test_linearity( + :(sin($x^2) + cos($y * 4) - 2.0), + ReverseAD.NONLINEAR, + Set([(2, 2), (1, 1)]), + [1, 2], + ) + _test_linearity(:(3 * 4 * ($x + $y)), ReverseAD.LINEAR) + _test_linearity( + :($z * $y), + ReverseAD.NONLINEAR, + Set([(3, 2), (3, 3), (2, 2)]), + [2, 3], + ) + _test_linearity(:(3 + 4), ReverseAD.CONSTANT) + _test_linearity(:(sin(3) + $x), ReverseAD.LINEAR) + _test_linearity( + :(cos($z) * sin(3) + $x), + ReverseAD.NONLINEAR, + Set([(3, 3)]), + [1, 3], + ) + _test_linearity(:($x - 3 * $y), ReverseAD.LINEAR) + _test_linearity(:(-$x), ReverseAD.LINEAR) + _test_linearity(:(+$x), ReverseAD.LINEAR) + _test_linearity( + :($x^$y), + ReverseAD.NONLINEAR, + Set([(2, 2), (1, 1), (2, 1)]), + [1, 2], + ) + _test_linearity(:($x / 3 + $y), ReverseAD.LINEAR) + _test_linearity( + :(3 / ($x * $y)), + ReverseAD.NONLINEAR, + Set([(2, 2), (1, 1), (2, 1)]), + [1, 2], + ) + _test_linearity(:(1 / ($x + 3)), ReverseAD.NONLINEAR, Set([(1, 1)]), [1]) + _test_linearity( + :(ifelse($x <= 1, $x, $y)), + ReverseAD.PIECEWISE_LINEAR, + Set([]), + [], + ) + _test_linearity( + :(ifelse($x <= 1, $x^2, $y)), + ReverseAD.NONLINEAR, + Set([(1, 1)]), + [1, 2], + ) + _test_linearity(:(ifelse(1 <= 1, 2, 3)), ReverseAD.CONSTANT) + _test_linearity( + :(1 / ifelse($x < 1, $x, 0)), + ReverseAD.NONLINEAR, + Set([(1, 1)]), + [1], + ) + return +end + +function test_dual_forward() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + function _test_dual_forward(input, x_input, test_value) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, input) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x, y], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 2) + MOI.eval_objective_gradient(data, ∇f, x_input) + @test isapprox(∇f, test_value, atol = 1e-10) + return + end + _test_dual_forward( + :(sin($x^2) + cos($y * 4) / 5 - 2.0), + [1.0, 2.0], + [1.0806046117362795, -0.7914865972987055], + ) + _test_dual_forward( + :(sin($x^$y) + cos($y * 4) / 5 - 2.0), + [1.0, 2.0], + [1.0806046117362795, -0.7914865972987055], + ) + _test_dual_forward( + :(sin($x^3) + cos($x * $y * 4) / 5 - 2.0), + [1.0, 0.0], + [1.6209069176044193, 0.0], + ) + _test_dual_forward( + :($x * $y), + [3.427139283036299e-206, 1.0], + [1.0, 3.427139283036299e-206], + ) + return +end + +function test_gradient_registered_function() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + z = MOI.VariableIndex(3) + data = Nonlinear.NonlinearData() + f(x, y) = (1 / 3) * y^3 - 2x^2 + function ∇f(g, x, y) + g[1] = -4x + g[2] = y^2 + return + end + Nonlinear.register_operator(data, :Φ, 2, f, ∇f) + Nonlinear.register_operator(data, :c, 1, cos, x -> -sin(x), x -> -cos(x)) + Nonlinear.set_objective(data, :(Φ($y, $x - 1) * c($z))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x, y, z], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 3) + x_input = [2.0, 3.0, 4.0] + @test MOI.eval_objective(data, x_input) == f(3.0, 2.0 - 1) * cos(4.0) + MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0, 4.0]) + @test ∇f ≈ [ + cos(x_input[3]) * (x_input[1] - 1)^2, + -4cos(x_input[3]) * x_input[2], + -sin(x_input[3]) * f(x_input[2], x_input[1] - 1), + ] + return +end + +function test_gradient_jump_855() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective( + data, + :(ifelse($x <= 3.0, ($x - 2.0)^2, 2 * log($x - 2.0) + 1.0)), + ) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 1) + MOI.eval_objective_gradient(data, ∇f, [-1.0]) + @test ∇f ≈ [-6.0] + MOI.eval_objective_gradient(data, ∇f, [2.0]) + @test ∇f ≈ [0.0] + return +end + +function test_gradient_abs() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(abs($x))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 1) + MOI.eval_objective_gradient(data, ∇f, [2.0]) + @test ∇f ≈ [1.0] + MOI.eval_objective_gradient(data, ∇f, [-2.0]) + @test ∇f ≈ [-1.0] + return +end + +function test_gradient_trig() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x, y], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 2) + MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0]) + @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5] + return +end + +function test_gradient_logical() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :($x > 0.5 && $x < 0.9)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + @test MOI.eval_objective(data, [1.5]) == 0.0 + ∇f = fill(NaN, 1) + MOI.eval_objective_gradient(data, ∇f, [1.5]) + @test iszero(∇f) + return +end + +function test_gradient_ifelse() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + @test MOI.eval_objective(data, [1.5]) == 1.5 + ∇f = fill(NaN, 1) + MOI.eval_objective_gradient(data, ∇f, [1.5]) + @test ∇f ≈ [1.0] + @test MOI.eval_objective(data, [-0.1]) == -0.1 + MOI.eval_objective_gradient(data, ∇f, [-0.1]) + @test ∇f ≈ [1.0] + @test MOI.eval_objective(data, [0.2]) == 5 + MOI.eval_objective_gradient(data, ∇f, [0.2]) + @test ∇f ≈ [0.0] + return +end + +function test_gradient_sqrt_nan() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :(sqrt($x))) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + @test isnan(MOI.eval_objective(data, [-1.5])) + ∇f = fill(Inf, 1) + MOI.eval_objective_gradient(data, ∇f, [-1.5]) + @test isnan(∇f[1]) + return +end + +function test_gradient_variable_power() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + z = MOI.VariableIndex(3) + data = Nonlinear.NonlinearData() + Nonlinear.set_objective(data, :((1 / $x)^$y - $z)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x, y, z], + ) + MOI.initialize(data, [:Grad]) + x_input = [2.5, 3.5, 1.0] + @test MOI.eval_objective(data, x_input) == (1 / 2.5)^3.5 - 1.0 + ∇f = fill(Inf, 3) + MOI.eval_objective_gradient(data, ∇f, x_input) + @test ∇f ≈ [ + -x_input[2] * x_input[1]^(-x_input[2] - 1), + -((1 / x_input[1])^x_input[2]) * log(x_input[1]), + -1, + ] + return +end + +function test_single_parameter() + x = MOI.VariableIndex(1) + data = Nonlinear.NonlinearData() + p = Nonlinear.add_parameter(data, 105.2) + Nonlinear.set_objective(data, :($p)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x], + ) + MOI.initialize(data, [:Grad]) + @test MOI.eval_objective(data, [-0.1]) == 105.2 + return +end + +function test_gradient_nested_subexpressions() + x = MOI.VariableIndex(1) + y = MOI.VariableIndex(2) + data = Nonlinear.NonlinearData() + ex1 = Nonlinear.add_expression(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) + ex2 = Nonlinear.add_expression(data, :($ex1)) + Nonlinear.set_objective(data, ex2) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + MOI.VariableIndex[x, y], + ) + MOI.initialize(data, [:Grad]) + ∇f = fill(NaN, 2) + MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0]) + @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5] + return +end + +end # module + +TestReverseAD.runtests() From e7b4f4b94e4c89f49aa91513c73eef1abd1a4090 Mon Sep 17 00:00:00 2001 From: odow Date: Wed, 20 Apr 2022 12:44:33 +1200 Subject: [PATCH 02/16] More test coverage --- test/Nonlinear/ReverseAD.jl | 45 +++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 1ce0346cc3..3b8cc58e08 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -1,7 +1,9 @@ module TestReverseAD using Test +import LinearAlgebra import MathOptInterface +import SparseArrays const MOI = MathOptInterface const Nonlinear = MOI.Nonlinear @@ -798,6 +800,49 @@ function test_gradient_nested_subexpressions() return end +function _dense_hessian(hessian_sparsity, V, n) + I = [i for (i, _) in hessian_sparsity] + J = [j for (_, j) in hessian_sparsity] + raw = SparseArrays.sparse(I, J, V, n, n) + return Matrix( + raw + raw' - + SparseArrays.sparse(LinearAlgebra.diagm(0 => LinearAlgebra.diag(raw))), + ) +end + +# This covers the code that computes Hessians in odd chunks of Hess-vec +# products. +function test_odd_chunks_Hessian_products() + for i in 1:18 + _test_odd_chunks_Hessian_products(i) + end + return +end + +function _test_odd_chunks_Hessian_products(N) + data = Nonlinear.NonlinearData() + x = MOI.VariableIndex.(1:N) + Nonlinear.set_objective(data, Expr(:call, :*, x...)) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + x, + ) + MOI.initialize(data, [:Hess]) + hessian_sparsity = MOI.hessian_lagrangian_structure(data) + V = zeros(length(hessian_sparsity)) + values = ones(N) + MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[]) + H = _dense_hessian(hessian_sparsity, V, N) + @test H ≈ (ones(N, N) - LinearAlgebra.diagm(0 => ones(N))) + values[1] = 0.5 + MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[]) + H = _dense_hessian(hessian_sparsity, V, N) + H_22 = (ones(N - 1, N - 1) - LinearAlgebra.diagm(0 => ones(N - 1))) / 2 + @test H ≈ [0 ones(N - 1)'; ones(N - 1) H_22] + return +end + end # module TestReverseAD.runtests() From b20e285296aa56888bac4c49a8d623e751228ea6 Mon Sep 17 00:00:00 2001 From: odow Date: Wed, 20 Apr 2022 12:48:04 +1200 Subject: [PATCH 03/16] Add DataStructures --- Project.toml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Project.toml b/Project.toml index 9f86aa8552..9aad182876 100644 --- a/Project.toml +++ b/Project.toml @@ -6,6 +6,7 @@ version = "1.2.0" BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" CodecBzip2 = "523fee87-0ab8-5b00-afb7-3ecf72e48cfd" CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193" +DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8" ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" @@ -22,6 +23,7 @@ Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" BenchmarkTools = "1" CodecBzip2 = "~0.6, 0.7" CodecZlib = "~0.6, 0.7" +DataStructures = "0.18" ForwardDiff = "0.5, 0.6, 0.7, 0.8, 0.9, 0.10" JSON = "~0.21" JSONSchema = "1" From 020a92f2b16056f9415684a485955bec6f6d8e8b Mon Sep 17 00:00:00 2001 From: odow Date: Thu, 21 Apr 2022 14:10:24 +1200 Subject: [PATCH 04/16] Fix rebase --- test/Nonlinear/ReverseAD.jl | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 3b8cc58e08..1e4366865d 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -451,7 +451,12 @@ function test_NLPBlockData() Nonlinear.add_constraint(data, :($x >= 2)) Nonlinear.add_constraint(data, :($x == 3)) Nonlinear.add_constraint(data, :(4 <= $x <= 5)) - block = MOI.NLPBlockData(data, [x]) + Nonlinear.set_differentiation_backend( + data, + Nonlinear.SparseReverseMode(), + [x], + ) + block = MOI.NLPBlockData(data) @test block.constraint_bounds == [ MOI.NLPBoundsPair(-Inf, 0.0), MOI.NLPBoundsPair(0.0, Inf), From f0ce71f1ddbb20eb8337793b0d2de484813840b1 Mon Sep 17 00:00:00 2001 From: odow Date: Thu, 21 Apr 2022 14:30:49 +1200 Subject: [PATCH 05/16] Fix docs --- docs/src/submodules/Nonlinear/overview.md | 54 +++++++++++------------ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 51e2fa2428..85ee2c4d03 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -302,35 +302,17 @@ However, we cannot call gradient terms such as [`eval_objective_gradient`](@ref) because [`Nonlinear.ExprGraphOnly`](@ref) does not have the capability to differentiate a nonlinear expression. -Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers, -solvers query the [`NLPBlock`](@ref) attribute, which returns an -[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and -includes other information such as constraint bounds and whether the evaluator -has a nonlinear objective. Create and set [`NLPBlockData`](@ref) as follows: +If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access +to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix +of the constraints, `:JacVec`, the ability to compute Jacobian-vector products, +and `:ExprGraph`. ```jldoctest nonlinear_developer -julia> block = MOI.NLPBlockData(evaluator); - -julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()); - -julia> MOI.set(model, MOI.NLPBlock(), block); -``` -!!! warning - Only call [`NLPBlockData`](@ref) once you have finished modifying the - problem in `model`. - -If, instead, we set [`Nonlinear.SparseReverseMode`](@ref), then we get access to -`:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix of -the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and -`:ExprGraph`. -```jldoctest nonlinear_developer -julia> Nonlinear.set_differentiation_backend( - data, +julia> evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), [x], ) - -julia> data -NonlinearData with available features: +Nonlinear.Evaluator with available features: * :Grad * :Jac * :JacVec @@ -339,7 +321,7 @@ NonlinearData with available features: However, before calling anything, we need to call [`initialize`](@ref): ```jldoctest nonlinear_developer -julia> MOI.initialize(data, [:Grad, :Jac, :JacVec, :ExprGraph]) +julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph]) ``` Now we can call methods like [`eval_objective`](@ref): @@ -348,7 +330,7 @@ julia> x = [1.0] 1-element Vector{Float64}: 1.0 -julia> MOI.eval_objective(data, x) +julia> MOI.eval_objective(evaluator, x) 7.268073418273571 ``` and [`eval_objective_gradient`](@ref): @@ -357,13 +339,29 @@ julia> grad = [NaN] 1-element Vector{Float64}: NaN -julia> MOI.eval_objective_gradient(data, grad, x) +julia> MOI.eval_objective_gradient(evaluator, grad, x) julia> grad 1-element Vector{Float64}: 1.909297426825682 ``` +Instead of passing [`Nonlinear.Evaluator`](@ref) directly to solvers, +solvers query the [`NLPBlock`](@ref) attribute, which returns an +[`NLPBlockData`](@ref). This object wraps an [`Nonlinear.Evaluator`](@ref) and +includes other information such as constraint bounds and whether the evaluator +has a nonlinear objective. Create and set [`NLPBlockData`](@ref) as follows: +```jldoctest nonlinear_developer +julia> block = MOI.NLPBlockData(evaluator); + +julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()); + +julia> MOI.set(model, MOI.NLPBlock(), block); +``` +!!! warning + Only call [`NLPBlockData`](@ref) once you have finished modifying the + problem in `model`. + ## Expression-graph representation [`Nonlinear.Model`](@ref) stores nonlinear expressions in From 27a4696ec8e6940874e3266e4845472ef7311492 Mon Sep 17 00:00:00 2001 From: odow Date: Thu, 21 Apr 2022 14:56:16 +1200 Subject: [PATCH 06/16] More docs --- docs/src/submodules/Nonlinear/overview.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 85ee2c4d03..506905e978 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -362,6 +362,28 @@ julia> MOI.set(model, MOI.NLPBlock(), block); Only call [`NLPBlockData`](@ref) once you have finished modifying the problem in `model`. +Putting everything together, you can create a nonlinear optimization problem in +MathOptInterface as follows: +```@example +import MathOptInterface +const MOI = MathOptInterface + +function build_model(model; backend) + x = MOI.add_variable(model) + y = MOI.add_variable(model) + MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE) + data = MOI.Nonlinear.NonlinearData() + MOI.Nonlinear.set_objective(data, :($x^2 + $y^2)) + MOI.Nonlinear.set_differentiation_backend(data, backend, [x, y]) + MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(data)) + return +end + +# Replace `model` and `backend` with your optimizer and backend of choice. +model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()) +build_model(model; backend = MOI.Nonlinear.ExprGraphOnly()) +``` + ## Expression-graph representation [`Nonlinear.Model`](@ref) stores nonlinear expressions in From 5babfa79f3325aa7003325482499400d58105597 Mon Sep 17 00:00:00 2001 From: odow Date: Thu, 21 Apr 2022 14:56:58 +1200 Subject: [PATCH 07/16] Another tweak to the docs --- docs/src/submodules/Nonlinear/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 506905e978..55302bb469 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -381,7 +381,7 @@ end # Replace `model` and `backend` with your optimizer and backend of choice. model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()) -build_model(model; backend = MOI.Nonlinear.ExprGraphOnly()) +build_model(model; backend = MOI.Nonlinear.SparseReverseMode()) ``` ## Expression-graph representation From 36c96ab73105208debd720551d025e9a33ad9e94 Mon Sep 17 00:00:00 2001 From: odow Date: Fri, 22 Apr 2022 14:19:12 +1200 Subject: [PATCH 08/16] Updates from nonlinear-api changes --- docs/src/submodules/Nonlinear/overview.md | 32 +- src/Nonlinear/ReverseAD/types.jl | 8 +- src/Nonlinear/types.jl | 7 +- test/Nonlinear/ReverseAD.jl | 527 +++++++++++----------- 4 files changed, 273 insertions(+), 301 deletions(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 55302bb469..24d699a6ae 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -372,10 +372,10 @@ function build_model(model; backend) x = MOI.add_variable(model) y = MOI.add_variable(model) MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE) - data = MOI.Nonlinear.NonlinearData() - MOI.Nonlinear.set_objective(data, :($x^2 + $y^2)) - MOI.Nonlinear.set_differentiation_backend(data, backend, [x, y]) - MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(data)) + nl_model = MOI.Nonlinear.Model() + MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2)) + evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y]) + MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator)) return end @@ -388,7 +388,7 @@ build_model(model; backend = MOI.Nonlinear.SparseReverseMode()) [`Nonlinear.Model`](@ref) stores nonlinear expressions in [`Nonlinear.Expression`](@ref)s. This section explains the design of -the expression graph datastructure in [`Nonlinear.Expression`](@ref). +the expression graph data structure in [`Nonlinear.Expression`](@ref). Given a nonlinear function like `f(x) = sin(x)^2 + x`, a conceptual aid for thinking about the graph representation of the expression is to convert it into @@ -411,7 +411,7 @@ julia> struct ExprNode julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]); ``` -This datastructure follows our Polish prefix notation very closely, and we can +This data structure follows our Polish prefix notation very closely, and we can easily identify the arguments to an operator. However, it has a significant draw-back: each node in the graph requires a `Vector`, which is heap-allocated and tracked by Julia's garbage collector (GC). For large models, we can expect @@ -447,7 +447,7 @@ challenges: ### Sketch of the design in Nonlinear -`Nonlinear` overcomes these problems by decomposing the datastructure into a +`Nonlinear` overcomes these problems by decomposing the data structure into a number of different concrete-typed vectors. First, we create vectors of the supported uni- and multivariate operators. @@ -510,7 +510,7 @@ julia> expr = Expression( ); ``` -This is less readable than the other options, but does this datastructure meet +This is less readable than the other options, but does this data structure meet our design goals? Instead of a heap-allocated object for each node, we only have two `Vector`s for @@ -547,9 +547,8 @@ subexpressions in the model. ## ReverseAD -`Nonlinear.ReverseAD` is a submodule for computing derivatives of the problem -inside [`Nonlinear.NonlinearData`](@ref) using sparse reverse-mode automatic -differentiation (AD). +`Nonlinear.ReverseAD` is a submodule for computing derivatives of a nonlinear +optimization problem using sparse reverse-mode automatic differentiation (AD). This section does not attempt to explain how sparse reverse-mode AD works, but instead explains why MOI contains it's own implementation, and highlights @@ -557,8 +556,7 @@ notable differences from similar packages. !!! warning You should not interact with `ReverseAD` directly. Instead, you should - create a [`Nonlinear.NonlinearData`](@ref) object, call - [`Nonlinear.set_differentiation_backend`](@ref) with + create a [`Nonlinear.Evaluator`](@ref) object with [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods. ### Why another AD package? @@ -582,10 +580,10 @@ Here are four reasons: input and then trace an expression graph using operator overloading. This means they must deal (or detect and ignore) with control flow, I/O, and other vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the - form of [`Nonlinear.NonlinearExpression`](@ref), which greatly limits the - range of syntax that it must deal with. By reducing the scope of what we - accept as input to functions relevant for mathematical optimization, we can - provide a simpler implementation with various performance optimizations. + form of [`Nonlinear.Expression`](@ref), which greatly limits the range of + syntax that it must deal with. By reducing the scope of what we accept as + input to functions relevant for mathematical optimization, we can provide a + simpler implementation with various performance optimizations. * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl), development of which begain in early 2014(!). This was well before the other packages started development. Because we had a well-tested, working AD in diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl index 348af23859..3c0f1042cb 100644 --- a/src/Nonlinear/ReverseAD/types.jl +++ b/src/Nonlinear/ReverseAD/types.jl @@ -11,7 +11,7 @@ mutable struct _SubexpressionStorage linearity::Linearity function _SubexpressionStorage( - expr::Nonlinear.NonlinearExpression, + expr::Nonlinear.Expression, subexpression_linearity, moi_index_to_consecutive_index, ) @@ -131,7 +131,7 @@ end """ NLPEvaluator( - data::Nonlinear.NonlinearData, + model::Nonlinear.Model, ordered_variables::Vector{MOI.VariableIndex}, ) @@ -142,7 +142,7 @@ interface. Before using, you must initialize the evaluator using `MOI.initialize`. """ mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator - data::Nonlinear.NonlinearData + data::Nonlinear.Model ordered_variables::Vector{MOI.VariableIndex} objective::Union{Nothing,_FunctionStorage} @@ -180,7 +180,7 @@ mutable struct NLPEvaluator <: MOI.AbstractNLPEvaluator max_chunk::Int # chunk size for which we've allocated storage function NLPEvaluator( - data::Nonlinear.NonlinearData, + data::Nonlinear.Model, ordered_variables::Vector{MOI.VariableIndex}, ) return new(data, ordered_variables) diff --git a/src/Nonlinear/types.jl b/src/Nonlinear/types.jl index 24c056a0ef..8a386e3abf 100644 --- a/src/Nonlinear/types.jl +++ b/src/Nonlinear/types.jl @@ -273,11 +273,10 @@ features in the MOI nonlinear interface. """ struct SparseReverseMode <: AbstractAutomaticDifferentiation end -function set_differentiation_backend( - data::NonlinearData, +function Evaluator( + model::Model, ::SparseReverseMode, ordered_variables::Vector{MOI.VariableIndex}, ) - data.inner = ReverseAD.NLPEvaluator(data, ordered_variables) - return + return Evaluator(model, ReverseAD.NLPEvaluator(model, ordered_variables)) end diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 1e4366865d..96112d3680 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -23,23 +23,26 @@ end function test_objective_quadratic_univariate() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :($x^2 + 1)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.2]) == 1.2^2 + 1 + model = Nonlinear.Model() + Nonlinear.set_objective(model, :($x^2 + 1)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.2]) == 1.2^2 + 1 g = [NaN] - MOI.eval_objective_gradient(data, g, [1.2]) + MOI.eval_objective_gradient(evaluator, g, [1.2]) @test g == [2.4] - @test MOI.hessian_lagrangian_structure(data) == [(1, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1)] H = [NaN] - MOI.eval_hessian_lagrangian(data, H, [1.2], 1.5, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, H, [1.2], 1.5, Float64[]) @test H == 1.5 .* [2.0] - MOI.eval_hessian_lagrangian_product(data, H, [1.2], [1.2], 1.5, Float64[]) + MOI.eval_hessian_lagrangian_product( + evaluator, + H, + [1.2], + [1.2], + 1.5, + Float64[], + ) @test H == [1.5 * 2.0 * 1.2] return end @@ -47,25 +50,30 @@ end function test_objective_quadratic_multivariate() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :($x^2 + $x * $y + $y^2)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 + model = Nonlinear.Model() + Nonlinear.set_objective(model, :($x^2 + $x * $y + $y^2)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 g = [NaN, NaN] - MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3]) @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3] - @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == + [(1, 1), (2, 2), (2, 1)] H = [NaN, NaN, NaN] - MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, H, [1.2, 2.3], 1.5, Float64[]) @test H == 1.5 .* [2.0, 2.0, 1.0] v = [0.3, 0.4] hv = [NaN, NaN] - MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[]) + MOI.eval_hessian_lagrangian_product( + evaluator, + hv, + [1.2, 2.3], + v, + 1.5, + Float64[], + ) @test hv ≈ 1.5 .* [2 1; 1 2] * v return end @@ -73,28 +81,33 @@ end function test_objective_quadratic_multivariate_subexpressions() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - ex = Nonlinear.add_expression(data, :($x^2)) - ey = Nonlinear.add_expression(data, :($y^2)) - exy = Nonlinear.add_expression(data, :($ex + $x * $y)) - Nonlinear.set_objective(data, :($exy + $ey)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 + model = Nonlinear.Model() + ex = Nonlinear.add_expression(model, :($x^2)) + ey = Nonlinear.add_expression(model, :($y^2)) + exy = Nonlinear.add_expression(model, :($ex + $x * $y)) + Nonlinear.set_objective(model, :($exy + $ey)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + 1.2 * 2.3 + 2.3^2 g = [NaN, NaN] - MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3]) @test g == [2 * 1.2 + 2.3, 1.2 + 2 * 2.3] - @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == + [(1, 1), (2, 2), (2, 1)] H = [NaN, NaN, NaN] - MOI.eval_hessian_lagrangian(data, H, [1.2, 2.3], 1.5, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, H, [1.2, 2.3], 1.5, Float64[]) @test H == 1.5 .* [2.0, 2.0, 1.0] v = [0.3, 0.4] hv = [NaN, NaN] - MOI.eval_hessian_lagrangian_product(data, hv, [1.2, 2.3], v, 1.5, Float64[]) + MOI.eval_hessian_lagrangian_product( + evaluator, + hv, + [1.2, 2.3], + v, + 1.5, + Float64[], + ) @test hv ≈ 1.5 .* [2 1; 1 2] * v return end @@ -102,20 +115,17 @@ end function test_objective_ifelse_comparison() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(ifelse(1 <= $x <= 2, $x^2, $y^2))) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 - @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2 + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(ifelse(1 <= $x <= 2, $x^2, $y^2))) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + @test MOI.eval_objective(evaluator, [2.2, 2.3]) == 2.3^2 g = [NaN, NaN] - MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3]) @test g == [2 * 1.2, 0.0] - MOI.eval_objective_gradient(data, g, [2.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [2.2, 2.3]) @test g == [0.0, 2 * 2.3] return end @@ -123,83 +133,68 @@ end function test_objective_ifelse_logic() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2))) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.2, 2.3]) == 1.2^2 - @test MOI.eval_objective(data, [2.2, 2.3]) == 2.3^2 + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(ifelse(1 <= $x && $x <= 2, $x^2, $y^2))) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.2, 2.3]) == 1.2^2 + @test MOI.eval_objective(evaluator, [2.2, 2.3]) == 2.3^2 g = [NaN, NaN] - MOI.eval_objective_gradient(data, g, [1.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [1.2, 2.3]) @test g == [2 * 1.2, 0.0] - MOI.eval_objective_gradient(data, g, [2.2, 2.3]) + MOI.eval_objective_gradient(evaluator, g, [2.2, 2.3]) @test g == [0.0, 2 * 2.3] return end function test_objective_parameter() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - p = Nonlinear.add_parameter(data, 1.2) - Nonlinear.set_objective(data, :($p * $x)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) - @test MOI.eval_objective(data, [1.3]) == 1.2 * 1.3 + model = Nonlinear.Model() + p = Nonlinear.add_parameter(model, 1.2) + Nonlinear.set_objective(model, :($p * $x)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + @test MOI.eval_objective(evaluator, [1.3]) == 1.2 * 1.3 g = [NaN] - MOI.eval_objective_gradient(data, g, [1.3]) + MOI.eval_objective_gradient(evaluator, g, [1.3]) @test g == [1.2] return end function test_objective_subexpression() - data = Nonlinear.NonlinearData() + model = Nonlinear.Model() x = MOI.VariableIndex(1) input = :($x^2 + 1) - expr = Nonlinear.add_expression(data, input) - expr_2 = Nonlinear.add_expression(data, :($expr^2)) - Nonlinear.set_objective(data, :($expr_2^2)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x], - ) - MOI.initialize(data, [:Grad]) - @test MOI.eval_objective(data, [1.3]) == ((1.3^2 + 1)^2)^2 + expr = Nonlinear.add_expression(model, input) + expr_2 = Nonlinear.add_expression(model, :($expr^2)) + Nonlinear.set_objective(model, :($expr_2^2)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) + MOI.initialize(evaluator, [:Grad]) + @test MOI.eval_objective(evaluator, [1.3]) == ((1.3^2 + 1)^2)^2 g = [NaN] - MOI.eval_objective_gradient(data, g, [1.3]) + MOI.eval_objective_gradient(evaluator, g, [1.3]) @test g ≈ [2 * (1.3^2 + 1)^2 * (2 * (1.3^2 + 1)) * 2 * 1.3] return end function test_constraint_quadratic_univariate() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.add_constraint(data, :($x^2 <= 2.0)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) + model = Nonlinear.Model() + Nonlinear.add_constraint(model, :($x^2 <= 2.0)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2] - MOI.eval_constraint(data, g, x_val) + MOI.eval_constraint(evaluator, g, x_val) @test g == x_val .^ 2 .- 2 - @test MOI.jacobian_structure(data) == [(1, 1)] + @test MOI.jacobian_structure(evaluator) == [(1, 1)] J = [NaN] - MOI.eval_constraint_jacobian(data, J, x_val) + MOI.eval_constraint_jacobian(evaluator, J, x_val) @test J == 2 .* x_val - @test MOI.hessian_lagrangian_structure(data) == [(1, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1)] H = [NaN] - MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5]) @test H == 1.5 .* [2.0] return end @@ -207,25 +202,23 @@ end function test_constraint_quadratic_multivariate() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.add_constraint(data, :($x^2 + $x * $y + $y^2 <= 2.0)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) + model = Nonlinear.Model() + Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2 <= 2.0)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2, 2.3] - MOI.eval_constraint(data, g, x_val) + MOI.eval_constraint(evaluator, g, x_val) @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 - @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)] + @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)] J = [NaN, NaN] - MOI.eval_constraint_jacobian(data, J, x_val) + MOI.eval_constraint_jacobian(evaluator, J, x_val) @test J == [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]] - @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == + [(1, 1), (2, 2), (2, 1)] H = [NaN, NaN, NaN] - MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5]) @test H == 1.5 .* [2.0, 2.0, 1.0] return end @@ -233,39 +226,37 @@ end function test_constraint_quadratic_multivariate_subexpressions() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - ex = Nonlinear.add_expression(data, :($x^2)) - ey = Nonlinear.add_expression(data, :($y^2)) - exy = Nonlinear.add_expression(data, :($ex + $x * $y)) - Nonlinear.add_constraint(data, :($exy + $ey <= 2.0)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) + model = Nonlinear.Model() + ex = Nonlinear.add_expression(model, :($x^2)) + ey = Nonlinear.add_expression(model, :($y^2)) + exy = Nonlinear.add_expression(model, :($ex + $x * $y)) + Nonlinear.add_constraint(model, :($exy + $ey <= 2.0)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2, 2.3] - MOI.eval_constraint(data, g, x_val) + MOI.eval_constraint(evaluator, g, x_val) @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 # Jacobian - @test MOI.jacobian_structure(data) == [(1, 1), (1, 2)] + @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)] J = [NaN, NaN] - MOI.eval_constraint_jacobian(data, J, x_val) + MOI.eval_constraint_jacobian(evaluator, J, x_val) @test J ≈ [2 * x_val[1] + x_val[2], x_val[1] + 2 * x_val[2]] # Jv y, w = [NaN], [1.1, 2.2] - MOI.eval_constraint_jacobian_product(data, y, x_val, w) + MOI.eval_constraint_jacobian_product(evaluator, y, x_val, w) @test y ≈ [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] * w # v'J y, w = [NaN, NaN], [1.1] - MOI.eval_constraint_jacobian_transpose_product(data, y, x_val, w) + MOI.eval_constraint_jacobian_transpose_product(evaluator, y, x_val, w) wJ = w' * [(2 * x_val[1] + x_val[2]) (x_val[1] + 2 * x_val[2])] @test y ≈ wJ[:] # Hessian-lagrangian - @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + @test MOI.hessian_lagrangian_structure(evaluator) == + [(1, 1), (2, 2), (2, 1)] H = [NaN, NaN, NaN] - MOI.eval_hessian_lagrangian(data, H, x_val, 0.0, [1.5]) + MOI.eval_hessian_lagrangian(evaluator, H, x_val, 0.0, [1.5]) @test H ≈ 1.5 .* [2.0, 2.0, 1.0] return end @@ -285,20 +276,17 @@ function test_hessian_sparsity_registered_function() H[2, 2] = 2 return end - data = Nonlinear.NonlinearData() - Nonlinear.register_operator(data, :f, 2, f, ∇f, ∇²f) - Nonlinear.set_objective(data, :(f($x, $z) + $y^2)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y, z], - ) - @test_broken :Hess in MOI.features_available(data) - # MOI.initialize(data, [:Grad, :Jac, :Hess]) - # @test MOI.hessian_lagrangian_structure(data) == + model = Nonlinear.Model() + Nonlinear.register_operator(model, :f, 2, f, ∇f, ∇²f) + Nonlinear.set_objective(model, :(f($x, $z) + $y^2)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y, z]) + @test_broken :Hess in MOI.features_available(evaluator) + # MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + # @test MOI.hessian_lagrangian_structure(evaluator) == # [(1, 1), (2, 2), (3, 3), (3, 1)] # H = fill(NaN, 4) - # MOI.eval_hessian_lagrangian(data, H, rand(3), 1.5, Float64[]) + # MOI.eval_hessian_lagrangian(evaluator, H, rand(3), 1.5, Float64[]) # @test H == 1.5 .* [2.0, 2.0, 2.0, 0.0] return end @@ -318,19 +306,16 @@ function test_hessian_sparsity_registered_rosenbrock() H[2, 2] = 200.0 return end - data = Nonlinear.NonlinearData() - Nonlinear.register_operator(data, :rosenbrock, 2, f, ∇f, ∇²f) - Nonlinear.set_objective(data, :(rosenbrock($x, $y))) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x, y], - ) - @test_broken :Hess in MOI.features_available(data) - # MOI.initialize(data, [:Grad, :Jac, :Hess]) - # @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2), (2, 1)] + model = Nonlinear.Model() + Nonlinear.register_operator(model, :rosenbrock, 2, f, ∇f, ∇²f) + Nonlinear.set_objective(model, :(rosenbrock($x, $y))) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) + @test_broken :Hess in MOI.features_available(evaluator) + # MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) + # @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2), (2, 1)] # H = fill(NaN, 3) - # MOI.eval_hessian_lagrangian(data, H, [1.0, 1.0], 1.5, Float64[]) + # MOI.eval_hessian_lagrangian(evaluator, H, [1.0, 1.0], 1.5, Float64[]) # @test H == 1.5 .* [802, 200, -400] return end @@ -420,22 +405,20 @@ end function test_derivatives() a = MOI.VariableIndex(1) b = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(sin($a^2) + cos($b * 4) / 5 - 2.0)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [a, b], - ) - MOI.initialize(data, [:Grad, :Jac, :Hess]) + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(sin($a^2) + cos($b * 4) / 5 - 2.0)) + evaluator = + Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [a, b]) + MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) x = [2.0, 3.0] - @test MOI.eval_objective(data, x) == sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0 + @test MOI.eval_objective(evaluator, x) == + sin(x[1]^2) + cos(x[2] * 4) / 5 - 2.0 g = [NaN, NaN] - MOI.eval_objective_gradient(data, g, x) + MOI.eval_objective_gradient(evaluator, g, x) @test g ≈ [2 * x[1] * cos(x[1]^2), -4 * sin(x[2] * 4) / 5] - @test MOI.hessian_lagrangian_structure(data) == [(1, 1), (2, 2)] + @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2)] H = [NaN, NaN] - MOI.eval_hessian_lagrangian(data, H, x, 1.5, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, H, x, 1.5, Float64[]) H_exact = [ -4 * x[1]^2 * sin(x[1]^2) + 2 * cos(x[1]^2), -4 / 5 * 4 * cos(x[2] * 4), @@ -445,18 +428,14 @@ function test_derivatives() end function test_NLPBlockData() - data = Nonlinear.NonlinearData() + model = Nonlinear.Model() x = MOI.VariableIndex(1) - Nonlinear.add_constraint(data, :($x <= 1)) - Nonlinear.add_constraint(data, :($x >= 2)) - Nonlinear.add_constraint(data, :($x == 3)) - Nonlinear.add_constraint(data, :(4 <= $x <= 5)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - [x], - ) - block = MOI.NLPBlockData(data) + Nonlinear.add_constraint(model, :($x <= 1)) + Nonlinear.add_constraint(model, :($x >= 2)) + Nonlinear.add_constraint(model, :($x == 3)) + Nonlinear.add_constraint(model, :(4 <= $x <= 5)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) + block = MOI.NLPBlockData(evaluator) @test block.constraint_bounds == [ MOI.NLPBoundsPair(-Inf, 0.0), MOI.NLPBoundsPair(0.0, Inf), @@ -473,9 +452,9 @@ function test_linearity() z = MOI.VariableIndex(3) variables = Dict(x => 1, y => 2, z => 3) function _test_linearity(input, test_value, IJ = [], indices = []) - data = Nonlinear.NonlinearData() - ex = Nonlinear.add_expression(data, input) - expr = data[ex] + model = Nonlinear.Model() + ex = Nonlinear.add_expression(model, input) + expr = model[ex] adj = Nonlinear.adjacency_matrix(expr.nodes) nodes = ReverseAD._replace_moi_variables(expr.nodes, variables) ret = ReverseAD._classify_linearity(nodes, adj, ReverseAD.Linearity[]) @@ -566,16 +545,16 @@ function test_dual_forward() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) function _test_dual_forward(input, x_input, test_value) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, input) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, input) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x, y], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 2) - MOI.eval_objective_gradient(data, ∇f, x_input) + MOI.eval_objective_gradient(evaluator, ∇f, x_input) @test isapprox(∇f, test_value, atol = 1e-10) return end @@ -606,26 +585,26 @@ function test_gradient_registered_function() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) z = MOI.VariableIndex(3) - data = Nonlinear.NonlinearData() + model = Nonlinear.Model() f(x, y) = (1 / 3) * y^3 - 2x^2 function ∇f(g, x, y) g[1] = -4x g[2] = y^2 return end - Nonlinear.register_operator(data, :Φ, 2, f, ∇f) - Nonlinear.register_operator(data, :c, 1, cos, x -> -sin(x), x -> -cos(x)) - Nonlinear.set_objective(data, :(Φ($y, $x - 1) * c($z))) - Nonlinear.set_differentiation_backend( - data, + Nonlinear.register_operator(model, :Φ, 2, f, ∇f) + Nonlinear.register_operator(model, :c, 1, cos, x -> -sin(x), x -> -cos(x)) + Nonlinear.set_objective(model, :(Φ($y, $x - 1) * c($z))) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x, y, z], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 3) x_input = [2.0, 3.0, 4.0] - @test MOI.eval_objective(data, x_input) == f(3.0, 2.0 - 1) * cos(4.0) - MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0, 4.0]) + @test MOI.eval_objective(evaluator, x_input) == f(3.0, 2.0 - 1) * cos(4.0) + MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0, 4.0]) @test ∇f ≈ [ cos(x_input[3]) * (x_input[1] - 1)^2, -4cos(x_input[3]) * x_input[2], @@ -636,39 +615,39 @@ end function test_gradient_jump_855() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() + model = Nonlinear.Model() Nonlinear.set_objective( - data, + model, :(ifelse($x <= 3.0, ($x - 2.0)^2, 2 * log($x - 2.0) + 1.0)), ) - Nonlinear.set_differentiation_backend( - data, + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 1) - MOI.eval_objective_gradient(data, ∇f, [-1.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [-1.0]) @test ∇f ≈ [-6.0] - MOI.eval_objective_gradient(data, ∇f, [2.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [2.0]) @test ∇f ≈ [0.0] return end function test_gradient_abs() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(abs($x))) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(abs($x))) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 1) - MOI.eval_objective_gradient(data, ∇f, [2.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [2.0]) @test ∇f ≈ [1.0] - MOI.eval_objective_gradient(data, ∇f, [-2.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [-2.0]) @test ∇f ≈ [-1.0] return end @@ -676,73 +655,73 @@ end function test_gradient_trig() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x, y], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 2) - MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0]) @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5] return end function test_gradient_logical() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :($x > 0.5 && $x < 0.9)) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :($x > 0.5 && $x < 0.9)) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) - @test MOI.eval_objective(data, [1.5]) == 0.0 + MOI.initialize(evaluator, [:Grad]) + @test MOI.eval_objective(evaluator, [1.5]) == 0.0 ∇f = fill(NaN, 1) - MOI.eval_objective_gradient(data, ∇f, [1.5]) + MOI.eval_objective_gradient(evaluator, ∇f, [1.5]) @test iszero(∇f) return end function test_gradient_ifelse() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5))) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(ifelse($x >= 0.5 || $x < 0.1, $x, 5))) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) - @test MOI.eval_objective(data, [1.5]) == 1.5 + MOI.initialize(evaluator, [:Grad]) + @test MOI.eval_objective(evaluator, [1.5]) == 1.5 ∇f = fill(NaN, 1) - MOI.eval_objective_gradient(data, ∇f, [1.5]) + MOI.eval_objective_gradient(evaluator, ∇f, [1.5]) @test ∇f ≈ [1.0] - @test MOI.eval_objective(data, [-0.1]) == -0.1 - MOI.eval_objective_gradient(data, ∇f, [-0.1]) + @test MOI.eval_objective(evaluator, [-0.1]) == -0.1 + MOI.eval_objective_gradient(evaluator, ∇f, [-0.1]) @test ∇f ≈ [1.0] - @test MOI.eval_objective(data, [0.2]) == 5 - MOI.eval_objective_gradient(data, ∇f, [0.2]) + @test MOI.eval_objective(evaluator, [0.2]) == 5 + MOI.eval_objective_gradient(evaluator, ∇f, [0.2]) @test ∇f ≈ [0.0] return end function test_gradient_sqrt_nan() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :(sqrt($x))) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :(sqrt($x))) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) - @test isnan(MOI.eval_objective(data, [-1.5])) + MOI.initialize(evaluator, [:Grad]) + @test isnan(MOI.eval_objective(evaluator, [-1.5])) ∇f = fill(Inf, 1) - MOI.eval_objective_gradient(data, ∇f, [-1.5]) + MOI.eval_objective_gradient(evaluator, ∇f, [-1.5]) @test isnan(∇f[1]) return end @@ -751,18 +730,18 @@ function test_gradient_variable_power() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) z = MOI.VariableIndex(3) - data = Nonlinear.NonlinearData() - Nonlinear.set_objective(data, :((1 / $x)^$y - $z)) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + Nonlinear.set_objective(model, :((1 / $x)^$y - $z)) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x, y, z], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) x_input = [2.5, 3.5, 1.0] - @test MOI.eval_objective(data, x_input) == (1 / 2.5)^3.5 - 1.0 + @test MOI.eval_objective(evaluator, x_input) == (1 / 2.5)^3.5 - 1.0 ∇f = fill(Inf, 3) - MOI.eval_objective_gradient(data, ∇f, x_input) + MOI.eval_objective_gradient(evaluator, ∇f, x_input) @test ∇f ≈ [ -x_input[2] * x_input[1]^(-x_input[2] - 1), -((1 / x_input[1])^x_input[2]) * log(x_input[1]), @@ -773,34 +752,34 @@ end function test_single_parameter() x = MOI.VariableIndex(1) - data = Nonlinear.NonlinearData() - p = Nonlinear.add_parameter(data, 105.2) - Nonlinear.set_objective(data, :($p)) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + p = Nonlinear.add_parameter(model, 105.2) + Nonlinear.set_objective(model, :($p)) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x], ) - MOI.initialize(data, [:Grad]) - @test MOI.eval_objective(data, [-0.1]) == 105.2 + MOI.initialize(evaluator, [:Grad]) + @test MOI.eval_objective(evaluator, [-0.1]) == 105.2 return end function test_gradient_nested_subexpressions() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) - data = Nonlinear.NonlinearData() - ex1 = Nonlinear.add_expression(data, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) - ex2 = Nonlinear.add_expression(data, :($ex1)) - Nonlinear.set_objective(data, ex2) - Nonlinear.set_differentiation_backend( - data, + model = Nonlinear.Model() + ex1 = Nonlinear.add_expression(model, :(sin($x^2) + cos($y * 4) / 5 - 2.0)) + ex2 = Nonlinear.add_expression(model, :($ex1)) + Nonlinear.set_objective(model, ex2) + evaluator = Nonlinear.Evaluator( + model, Nonlinear.SparseReverseMode(), MOI.VariableIndex[x, y], ) - MOI.initialize(data, [:Grad]) + MOI.initialize(evaluator, [:Grad]) ∇f = fill(NaN, 2) - MOI.eval_objective_gradient(data, ∇f, [2.0, 3.0]) + MOI.eval_objective_gradient(evaluator, ∇f, [2.0, 3.0]) @test ∇f ≈ [2 * 2.0 * cos(2.0^2), -4 * sin(3.0 * 4) / 5] return end @@ -825,23 +804,19 @@ function test_odd_chunks_Hessian_products() end function _test_odd_chunks_Hessian_products(N) - data = Nonlinear.NonlinearData() + model = Nonlinear.Model() x = MOI.VariableIndex.(1:N) - Nonlinear.set_objective(data, Expr(:call, :*, x...)) - Nonlinear.set_differentiation_backend( - data, - Nonlinear.SparseReverseMode(), - x, - ) - MOI.initialize(data, [:Hess]) - hessian_sparsity = MOI.hessian_lagrangian_structure(data) + Nonlinear.set_objective(model, Expr(:call, :*, x...)) + evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), x) + MOI.initialize(evaluator, [:Hess]) + hessian_sparsity = MOI.hessian_lagrangian_structure(evaluator) V = zeros(length(hessian_sparsity)) values = ones(N) - MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, V, values, 1.0, Float64[]) H = _dense_hessian(hessian_sparsity, V, N) @test H ≈ (ones(N, N) - LinearAlgebra.diagm(0 => ones(N))) values[1] = 0.5 - MOI.eval_hessian_lagrangian(data, V, values, 1.0, Float64[]) + MOI.eval_hessian_lagrangian(evaluator, V, values, 1.0, Float64[]) H = _dense_hessian(hessian_sparsity, V, N) H_22 = (ones(N - 1, N - 1) - LinearAlgebra.diagm(0 => ones(N - 1))) / 2 @test H ≈ [0 ones(N - 1)'; ones(N - 1) H_22] From aa3d326db2c160f94cb237f0b3c07d846917ac9d Mon Sep 17 00:00:00 2001 From: odow Date: Mon, 25 Apr 2022 12:17:26 +1200 Subject: [PATCH 09/16] Add license header --- src/Nonlinear/ReverseAD/Coloring/Coloring.jl | 6 ++++++ src/Nonlinear/ReverseAD/ReverseAD.jl | 6 ++++++ src/Nonlinear/ReverseAD/forward_over_reverse.jl | 6 ++++++ src/Nonlinear/ReverseAD/graph_tools.jl | 6 ++++++ src/Nonlinear/ReverseAD/mathoptinterface_api.jl | 6 ++++++ src/Nonlinear/ReverseAD/precompile.jl | 6 ++++++ src/Nonlinear/ReverseAD/reverse_mode.jl | 6 ++++++ src/Nonlinear/ReverseAD/types.jl | 6 ++++++ src/Nonlinear/ReverseAD/utils.jl | 6 ++++++ test/Nonlinear/ReverseAD.jl | 6 ++++++ 10 files changed, 60 insertions(+) diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl index 6a2b5f3eb5..8a4ee89e3c 100644 --- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl +++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + module Coloring import DataStructures diff --git a/src/Nonlinear/ReverseAD/ReverseAD.jl b/src/Nonlinear/ReverseAD/ReverseAD.jl index 5d865466d0..34bd00bd80 100644 --- a/src/Nonlinear/ReverseAD/ReverseAD.jl +++ b/src/Nonlinear/ReverseAD/ReverseAD.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + module ReverseAD import ForwardDiff diff --git a/src/Nonlinear/ReverseAD/forward_over_reverse.jl b/src/Nonlinear/ReverseAD/forward_over_reverse.jl index a272cba32d..a9d07f8ef5 100644 --- a/src/Nonlinear/ReverseAD/forward_over_reverse.jl +++ b/src/Nonlinear/ReverseAD/forward_over_reverse.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + const TAG = :ReverseAD """ diff --git a/src/Nonlinear/ReverseAD/graph_tools.jl b/src/Nonlinear/ReverseAD/graph_tools.jl index b3c06f9ecf..80d8fdc555 100644 --- a/src/Nonlinear/ReverseAD/graph_tools.jl +++ b/src/Nonlinear/ReverseAD/graph_tools.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + """ _replace_moi_variables( nodes::Vector{Nonlinear.Node}, diff --git a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl index 7adee13d3f..e9739f773c 100644 --- a/src/Nonlinear/ReverseAD/mathoptinterface_api.jl +++ b/src/Nonlinear/ReverseAD/mathoptinterface_api.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + function MOI.features_available(d::NLPEvaluator) # Check if we have any user-defined multivariate operators, in which case we # need to disable hessians. The result of features_available depends on this. diff --git a/src/Nonlinear/ReverseAD/precompile.jl b/src/Nonlinear/ReverseAD/precompile.jl index 16915364f8..b987ea42e8 100644 --- a/src/Nonlinear/ReverseAD/precompile.jl +++ b/src/Nonlinear/ReverseAD/precompile.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + function _precompile_() ccall(:jl_generating_output, Cint, ()) == 1 || return nothing Base.precompile( diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl index 4c09c8b11e..a6758aa05c 100644 --- a/src/Nonlinear/ReverseAD/reverse_mode.jl +++ b/src/Nonlinear/ReverseAD/reverse_mode.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + """ _reverse_mode(d::NLPEvaluator, x) diff --git a/src/Nonlinear/ReverseAD/types.jl b/src/Nonlinear/ReverseAD/types.jl index 3c0f1042cb..6f2163708f 100644 --- a/src/Nonlinear/ReverseAD/types.jl +++ b/src/Nonlinear/ReverseAD/types.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + mutable struct _SubexpressionStorage nodes::Vector{Nonlinear.Node} adj::SparseArrays.SparseMatrixCSC{Bool,Int} diff --git a/src/Nonlinear/ReverseAD/utils.jl b/src/Nonlinear/ReverseAD/utils.jl index ca3ae6432c..e67f7f1346 100644 --- a/src/Nonlinear/ReverseAD/utils.jl +++ b/src/Nonlinear/ReverseAD/utils.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + """ _UnsafeVectorView(offset, len, ptr) diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 96112d3680..556fea7cc3 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -1,3 +1,9 @@ +# Copyright (c) 2017: Miles Lubin and contributors +# Copyright (c) 2017: Google Inc. +# +# Use of this source code is governed by an MIT-style license that can be found +# in the LICENSE.md file or at https://opensource.org/licenses/MIT. + module TestReverseAD using Test From f125e164523ad2ea492a24ac427492601ce411f7 Mon Sep 17 00:00:00 2001 From: odow Date: Tue, 26 Apr 2022 06:17:53 +1200 Subject: [PATCH 10/16] Update for latest changes --- test/Nonlinear/ReverseAD.jl | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 556fea7cc3..52ddea52be 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -187,13 +187,13 @@ end function test_constraint_quadratic_univariate() x = MOI.VariableIndex(1) model = Nonlinear.Model() - Nonlinear.add_constraint(model, :($x^2 <= 2.0)) + Nonlinear.add_constraint(model, :($x^2), MOI.LessThan(2.0)) evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2] MOI.eval_constraint(evaluator, g, x_val) - @test g == x_val .^ 2 .- 2 + @test g == x_val .^ 2 @test MOI.jacobian_structure(evaluator) == [(1, 1)] J = [NaN] MOI.eval_constraint_jacobian(evaluator, J, x_val) @@ -209,14 +209,14 @@ function test_constraint_quadratic_multivariate() x = MOI.VariableIndex(1) y = MOI.VariableIndex(2) model = Nonlinear.Model() - Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2 <= 2.0)) + Nonlinear.add_constraint(model, :($x^2 + $x * $y + $y^2), MOI.LessThan(2.0)) evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2, 2.3] MOI.eval_constraint(evaluator, g, x_val) - @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 + @test g == [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)] J = [NaN, NaN] MOI.eval_constraint_jacobian(evaluator, J, x_val) @@ -236,14 +236,14 @@ function test_constraint_quadratic_multivariate_subexpressions() ex = Nonlinear.add_expression(model, :($x^2)) ey = Nonlinear.add_expression(model, :($y^2)) exy = Nonlinear.add_expression(model, :($ex + $x * $y)) - Nonlinear.add_constraint(model, :($exy + $ey <= 2.0)) + Nonlinear.add_constraint(model, :($exy + $ey), MOI.LessThan(2.0)) evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) g = [NaN] x_val = [1.2, 2.3] MOI.eval_constraint(evaluator, g, x_val) - @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] .- 2 + @test g ≈ [x_val[1]^2 + x_val[1] * x_val[2] + x_val[2]^2] # Jacobian @test MOI.jacobian_structure(evaluator) == [(1, 1), (1, 2)] J = [NaN, NaN] @@ -436,10 +436,10 @@ end function test_NLPBlockData() model = Nonlinear.Model() x = MOI.VariableIndex(1) - Nonlinear.add_constraint(model, :($x <= 1)) - Nonlinear.add_constraint(model, :($x >= 2)) - Nonlinear.add_constraint(model, :($x == 3)) - Nonlinear.add_constraint(model, :(4 <= $x <= 5)) + Nonlinear.add_constraint(model, :($x - 1), MOI.LessThan(0.0)) + Nonlinear.add_constraint(model, :($x - 2), MOI.GreaterThan(0.0)) + Nonlinear.add_constraint(model, :($x - 3), MOI.EqualTo(0.0)) + Nonlinear.add_constraint(model, :($x), MOI.Interval(4.0, 5.0)) evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x]) block = MOI.NLPBlockData(evaluator) @test block.constraint_bounds == [ From db384d94b5c84d47c7af3a63003daa484317998e Mon Sep 17 00:00:00 2001 From: odow Date: Tue, 3 May 2022 08:56:11 +1200 Subject: [PATCH 11/16] Re-add function evaluations --- src/Nonlinear/ReverseAD/Coloring/Coloring.jl | 12 +- src/Nonlinear/ReverseAD/reverse_mode.jl | 135 ++++++++++++++++--- 2 files changed, 122 insertions(+), 25 deletions(-) diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl index 8a4ee89e3c..9a3e295c93 100644 --- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl +++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl @@ -94,7 +94,7 @@ function UndirectedGraph(I, J, nel) offsets[k+1] = offsets[k] + adjcount[k] end fill!(adjcount, 0) - edges = Array{Tuple{Int,Int}}(undef, n_edges) + edges = Vector{Tuple{Int,Int}}(undef, n_edges) adjlist = Vector{Int}(undef, offsets[nel+1] - 1) edgeindex = Vector{Int}(undef, length(adjlist)) edge_count = 0 @@ -321,7 +321,7 @@ function recovery_preprocess( edge_count[idx] += 1 end # edges sorted by twocolor subgraph - sorted_edges = Array{Vector{Tuple{Int,Int}}}(undef, seen_twocolors) + sorted_edges = Vector{Vector{Tuple{Int,Int}}}(undef, seen_twocolors) for idx in 1:seen_twocolors sorted_edges[idx] = Tuple{Int,Int}[] sizehint!(sorted_edges[idx], edge_count[idx]) @@ -334,9 +334,9 @@ function recovery_preprocess( push!(sorted_edges[idx], (u, v)) end # list of unique vertices in each twocolor subgraph - vertexmap = Array{Vector{Int}}(undef, seen_twocolors) - postorder = Array{Vector{Int}}(undef, seen_twocolors) - parents = Array{Vector{Int}}(undef, seen_twocolors) + vertexmap = Vector{Vector{Int}}(undef, seen_twocolors) + postorder = Vector{Vector{Int}}(undef, seen_twocolors) + parents = Vector{Vector{Int}}(undef, seen_twocolors) # temporary lookup map from global index to subgraph index revmap = zeros(Int, _num_vertices(g)) adjcount = zeros(Int, _num_vertices(g)) @@ -485,7 +485,7 @@ end Allocate a seed matrix for the Coloring. """ function seed_matrix(rinfo::RecoveryInfo) - return Array{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors) + return Matrix{Float64}(undef, length(rinfo.local_indices), rinfo.num_colors) end """ diff --git a/src/Nonlinear/ReverseAD/reverse_mode.jl b/src/Nonlinear/ReverseAD/reverse_mode.jl index a6758aa05c..136e04232c 100644 --- a/src/Nonlinear/ReverseAD/reverse_mode.jl +++ b/src/Nonlinear/ReverseAD/reverse_mode.jl @@ -108,25 +108,122 @@ function _forward_eval( elseif node.type == Nonlinear.NODE_CALL_MULTIVARIATE children_indices = SparseArrays.nzrange(f.adj, k) N = length(children_indices) - f_input = _UnsafeVectorView(d.jac_storage, N) - ∇f = _UnsafeVectorView(d.user_output_buffer, N) - for (r, i) in enumerate(children_indices) - f_input[r] = f.forward_storage[children_arr[i]] - ∇f[r] = 0.0 - end - f.forward_storage[k] = Nonlinear.eval_multivariate_function( - operators, - operators.multivariate_operators[node.index], - f_input, - ) - Nonlinear.eval_multivariate_gradient( - operators, - operators.multivariate_operators[node.index], - ∇f, - f_input, - ) - for (r, i) in enumerate(children_indices) - f.partials_storage[children_arr[i]] = ∇f[r] + # TODO(odow); + # With appropriate benchmarking, the special-cased if-statements can + # be removed in favor of the generic user-defined function case. + if node.index == 1 # :+ + tmp_sum = zero(T) + for c_idx in children_indices + @inbounds ix = children_arr[c_idx] + @inbounds f.partials_storage[ix] = one(T) + @inbounds tmp_sum += f.forward_storage[ix] + end + f.forward_storage[k] = tmp_sum + elseif node.index == 2 # :- + @assert N == 2 + child1 = first(children_indices) + @inbounds ix1 = children_arr[child1] + @inbounds ix2 = children_arr[child1+1] + @inbounds tmp_sub = f.forward_storage[ix1] + @inbounds tmp_sub -= f.forward_storage[ix2] + @inbounds f.partials_storage[ix1] = one(T) + @inbounds f.partials_storage[ix2] = -one(T) + f.forward_storage[k] = tmp_sub + elseif node.index == 3 # :* + tmp_prod = one(T) + for c_idx in children_indices + @inbounds tmp_prod *= f.forward_storage[children_arr[c_idx]] + end + if tmp_prod == zero(T) || N <= 2 + # This is inefficient if there are a lot of children. + # 2 is chosen as a limit because (x*y)/y does not always + # equal x for floating-point numbers. This can produce + # unexpected error in partials. There's still an error when + # multiplying three or more terms, but users are less likely + # to complain about it. + for c_idx in children_indices + prod_others = one(T) + for c_idx2 in children_indices + (c_idx == c_idx2) && continue + ix = children_arr[c_idx2] + prod_others *= f.forward_storage[ix] + end + f.partials_storage[children_arr[c_idx]] = prod_others + end + else + # Compute all-minus-one partial derivatives by dividing from + # the total product. + for c_idx in children_indices + ix = children_arr[c_idx] + f.partials_storage[ix] = + tmp_prod / f.forward_storage[ix] + end + end + @inbounds f.forward_storage[k] = tmp_prod + elseif node.index == 4 # :^ + @assert N == 2 + idx1 = first(children_indices) + idx2 = last(children_indices) + @inbounds ix1 = children_arr[idx1] + @inbounds ix2 = children_arr[idx2] + @inbounds base = f.forward_storage[ix1] + @inbounds exponent = f.forward_storage[ix2] + if exponent == 2 + @inbounds f.forward_storage[k] = base * base + @inbounds f.partials_storage[ix1] = 2 * base + elseif exponent == 1 + @inbounds f.forward_storage[k] = base + @inbounds f.partials_storage[ix1] = 1.0 + else + f.forward_storage[k] = pow(base, exponent) + f.partials_storage[ix1] = exponent * pow(base, exponent - 1) + end + f.partials_storage[ix2] = f.forward_storage[k] * log(base) + elseif node.index == 5 # :/ + @assert N == 2 + idx1 = first(children_indices) + idx2 = last(children_indices) + @inbounds ix1 = children_arr[idx1] + @inbounds ix2 = children_arr[idx2] + @inbounds numerator = f.forward_storage[ix1] + @inbounds denominator = f.forward_storage[ix2] + recip_denominator = 1 / denominator + @inbounds f.partials_storage[ix1] = recip_denominator + f.partials_storage[ix2] = + -numerator * recip_denominator * recip_denominator + f.forward_storage[k] = numerator * recip_denominator + elseif node.index == 6 # ifelse + @assert N == 3 + idx1 = first(children_indices) + @inbounds condition = f.forward_storage[children_arr[idx1]] + @inbounds lhs = f.forward_storage[children_arr[idx1+1]] + @inbounds rhs = f.forward_storage[children_arr[idx1+2]] + @inbounds f.partials_storage[children_arr[idx1+1]] = + condition == 1 + @inbounds f.partials_storage[children_arr[idx1+2]] = + !(condition == 1) + f.forward_storage[k] = ifelse(condition == 1, lhs, rhs) + else + f_input = _UnsafeVectorView(d.jac_storage, N) + ∇f = _UnsafeVectorView(d.user_output_buffer, N) + for (r, i) in enumerate(children_indices) + f_input[r] = f.forward_storage[children_arr[i]] + ∇f[r] = 0.0 + end + f.forward_storage[k] = Nonlinear.eval_multivariate_function( + operators, + operators.multivariate_operators[node.index], + f_input, + ) + Nonlinear.eval_multivariate_gradient( + operators, + operators.multivariate_operators[node.index], + ∇f, + f_input, + ) + for (r, i) in enumerate(children_indices) + f.partials_storage[children_arr[i]] = ∇f[r] + end end elseif node.type == Nonlinear.NODE_CALL_UNIVARIATE child_idx = children_arr[f.adj.colptr[k]] From 8420fb722f0085f60e29f8b07037d9f56d8ef884 Mon Sep 17 00:00:00 2001 From: odow Date: Tue, 3 May 2022 09:12:17 +1200 Subject: [PATCH 12/16] Address mlubins comments --- docs/src/submodules/Nonlinear/overview.md | 101 ++++++++++++---------- test/Nonlinear/ReverseAD.jl | 2 + 2 files changed, 56 insertions(+), 47 deletions(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 24d699a6ae..a5c42f9240 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -303,7 +303,7 @@ However, we cannot call gradient terms such as not have the capability to differentiate a nonlinear expression. If, instead, we pass [`Nonlinear.SparseReverseMode`](@ref), then we get access -to `:Grad`, the gradient of the objective function, `:Jac`, the jacobian matrix +to `:Grad`, the gradient of the objective function, `:Jac`, the Jacobian matrix of the constraints, `:JacVec`, the ability to compute Jacobian-vector products, and `:ExprGraph`. ```jldoctest nonlinear_developer @@ -319,7 +319,7 @@ Nonlinear.Evaluator with available features: * :ExprGraph ``` -However, before calling anything, we need to call [`initialize`](@ref): +However, before using the evaluator, we need to call [`initialize`](@ref): ```jldoctest nonlinear_developer julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph]) ``` @@ -335,9 +335,9 @@ julia> MOI.eval_objective(evaluator, x) ``` and [`eval_objective_gradient`](@ref): ```jldoctest nonlinear_developer -julia> grad = [NaN] +julia> grad = [0.0] 1-element Vector{Float64}: - NaN + 0.0 julia> MOI.eval_objective_gradient(evaluator, grad, x) @@ -551,52 +551,59 @@ subexpressions in the model. optimization problem using sparse reverse-mode automatic differentiation (AD). This section does not attempt to explain how sparse reverse-mode AD works, but -instead explains why MOI contains it's own implementation, and highlights +instead explains why MOI contains its own implementation, and highlights notable differences from similar packages. !!! warning - You should not interact with `ReverseAD` directly. Instead, you should - create a [`Nonlinear.Evaluator`](@ref) object with - [`Nonlinear.SparseReverseMode`](@ref), and then query the MOI API methods. + Don't use the API in `ReverseAD` to compute derivatives. Instead, create a + [`Nonlinear.Evaluator`](@ref) object with [`Nonlinear.SparseReverseMode`](@ref) + as the backend, and then query the MOI API methods. -### Why another AD package? +### Design goals The JuliaDiff organization maintains a [list of packages](https://juliadiff.org) -for doing AD in Julia. At last count, there were at least ten packages–not -including `ReverseAD`–for reverse-mode AD in Julia. Given this multitude, why -does MOI maintain another implementation instead of re-using existing tooling? - -Here are four reasons: - - * **Scale and Sparsity:** the types of functions that MOI computes derivatives - of have two key characteristics: they can be very large scale (10^5 or more - functions across 10^5 or more variables) and they are very sparse. For large - problems, it is common for the hessian to have `O(n)` non-zero elements - instead of `O(n^2)` if it was dense. To the best of our knowledge, - `ReverseAD` is the only reverse-mode AD system in Julia that handles sparsity - by default. The lack of sparsity support is _the_ main reason why we do no - use a generic package. - * **Limited scope:** most other AD packages accept arbitrary Julia functions as - input and then trace an expression graph using operator overloading. This - means they must deal (or detect and ignore) with control flow, I/O, and other - vagaries of Julia. In contrast, `ReverseAD` only accepts functions in the - form of [`Nonlinear.Expression`](@ref), which greatly limits the range of - syntax that it must deal with. By reducing the scope of what we accept as - input to functions relevant for mathematical optimization, we can provide a - simpler implementation with various performance optimizations. - * **Historical:** `ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl), - development of which begain in early 2014(!). This was well before the other - packages started development. Because we had a well-tested, working AD in - JuMP, there was less motivation to contribute to and explore other AD - packages. The lack of historical interaction also meant that other packages - were not optimized for the types of problems that JuMP is built for (i.e., - large-scale sparse problems). When we first created MathOptInterface, we kept - the AD in JuMP to simplify the transition, and post-poned the development of - a first-class nonlinear interface in MathOptInterface. - * **Technical debt** Prior to the introduction of `Nonlinear`, JuMP's nonlinear - implementation was a confusing mix of functions and types spread across the - code base and in the private `_Derivatives` submodule. This made it hard to - swap the AD system for another. The main motivation for refactoring JuMP to - create the `Nonlinear` submodule in MathOptInterface was to abstract the - interface between JuMP and the AD system, allowing us to swap-in and test new - AD systems in the future. +for doing AD in Julia. At last count, there were at least ten packages–-not +including `ReverseAD`-–for reverse-mode AD in Julia. `ReverseAD` exists because +it has a different set of design goals. + + * **Goal: handle scale and sparsity** + The types of functions that MOI computes derivatives of have two key + characteristics: they can be very large scale (10^5 or more functions across + 10^5 or more variables) and they are very sparse. For large problems, it is + common for the hessian to have `O(n)` non-zero elements instead of `O(n^2)` + if it was dense. To the best of our knowledge, `ReverseAD` is the only + reverse-mode AD system in Julia that handles sparsity by default. + * **Goal: limit the scope to improve robustness** + Most other AD packages accept arbitrary Julia functions as input and then + trace an expression graph using operator overloading. This means they must + deal (or detect and ignore) with control flow, I/O, and other vagaries of + Julia. In contrast, `ReverseAD` only accepts functions in the form of + [`Nonlinear.Expression`](@ref), which greatly limits the range of syntax that + it must deal with. By reducing the scope of what we accept as input to + functions relevant for mathematical optimization, we can provide a simpler + implementation with various performance optimizations. + * **Goal: provide outputs which match what solvers expect** + Other AD packages focus on differentiating individual Julia functions. In + constrast, `ReverseAD` has a very specific use-case: to generate outputs + needed by the MOI nonlinear API. This means it needs to efficiently compute + sparse Hessians, and it needs subexpression handling to avoid recomputing + subexpressions that are shared between functions. + +### History + +`ReverseAD` started life as [ReverseDiffSparse.jl](https://github.com/mlubin/ReverseDiffSparse.jl), +development of which begain in early 2014(!). This was well before the other +packages started development. Because we had a well-tested, working AD in JuMP, +there was less motivation to contribute to and explore other AD packages. The +lack of historical interaction also meant that other packages were not optimized +for the types of problems that JuMP is built for (i.e., large-scale sparse +problems). When we first created MathOptInterface, we kept the AD in JuMP to +simplify the transition, and post-poned the development of a first-class +nonlinear interface in MathOptInterface. + +Prior to the introduction of `Nonlinear`, JuMP's nonlinear implementation was a +confusing mix of functions and types spread across the code base and in the +private `_Derivatives` submodule. This made it hard to swap the AD system for +another. The main motivation for refactoring JuMP to create the `Nonlinear` +submodule in MathOptInterface was to abstract the interface between JuMP and the +AD system, allowing us to swap-in and test new AD systems in the future. diff --git a/test/Nonlinear/ReverseAD.jl b/test/Nonlinear/ReverseAD.jl index 52ddea52be..67786e2d57 100644 --- a/test/Nonlinear/ReverseAD.jl +++ b/test/Nonlinear/ReverseAD.jl @@ -288,6 +288,7 @@ function test_hessian_sparsity_registered_function() evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y, z]) @test_broken :Hess in MOI.features_available(evaluator) + # TODO(odow): re-enable these tests when user-defined hessians are supported # MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) # @test MOI.hessian_lagrangian_structure(evaluator) == # [(1, 1), (2, 2), (3, 3), (3, 1)] @@ -318,6 +319,7 @@ function test_hessian_sparsity_registered_rosenbrock() evaluator = Nonlinear.Evaluator(model, Nonlinear.SparseReverseMode(), [x, y]) @test_broken :Hess in MOI.features_available(evaluator) + # TODO(odow): re-enable these tests when user-defined hessians are supported # MOI.initialize(evaluator, [:Grad, :Jac, :Hess]) # @test MOI.hessian_lagrangian_structure(evaluator) == [(1, 1), (2, 2), (2, 1)] # H = fill(NaN, 3) From b0b85f460189d39419689192c196e57412d341dd Mon Sep 17 00:00:00 2001 From: Oscar Dowson Date: Tue, 3 May 2022 19:40:35 +1200 Subject: [PATCH 13/16] Update src/Nonlinear/ReverseAD/Coloring/Coloring.jl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mathieu Besançon --- src/Nonlinear/ReverseAD/Coloring/Coloring.jl | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl index 9a3e295c93..e0d89ee35e 100644 --- a/src/Nonlinear/ReverseAD/Coloring/Coloring.jl +++ b/src/Nonlinear/ReverseAD/Coloring/Coloring.jl @@ -28,10 +28,8 @@ function Base.push!(v::IndexedSet, i::Integer) end function Base.empty!(v::IndexedSet) - nzidx = v.nzidx - empty = v.empty for i in 1:v.nnz - empty[nzidx[i]] = true + v.empty[v.nzidx[i]] = true end v.nnz = 0 return v From bbd366527c71c82ce36f58d966ca3f534704f17a Mon Sep 17 00:00:00 2001 From: Oscar Dowson Date: Wed, 4 May 2022 09:22:30 +1200 Subject: [PATCH 14/16] Update docs/src/submodules/Nonlinear/overview.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mathieu Besançon --- docs/src/submodules/Nonlinear/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index a5c42f9240..98b743e669 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -368,7 +368,7 @@ MathOptInterface as follows: import MathOptInterface const MOI = MathOptInterface -function build_model(model; backend) +function build_model(model; backend::Nonlinear.AbstractAutomaticDifferentiation) x = MOI.add_variable(model) y = MOI.add_variable(model) MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE) From 72f12185ebe4f55e8a9d60f5acdc61cd4cae2808 Mon Sep 17 00:00:00 2001 From: Oscar Dowson Date: Wed, 4 May 2022 10:07:46 +1200 Subject: [PATCH 15/16] Update docs/src/submodules/Nonlinear/overview.md --- docs/src/submodules/Nonlinear/overview.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 98b743e669..1ab9cba736 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -368,7 +368,10 @@ MathOptInterface as follows: import MathOptInterface const MOI = MathOptInterface -function build_model(model; backend::Nonlinear.AbstractAutomaticDifferentiation) +function build_model( + model::MOI.ModelLike; + backend::MOI.Nonlinear.AbstractAutomaticDifferentiation, +) x = MOI.add_variable(model) y = MOI.add_variable(model) MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE) From 3d8e547b141f103d542ce703728bb19f46f86026 Mon Sep 17 00:00:00 2001 From: odow Date: Fri, 6 May 2022 08:56:23 +1200 Subject: [PATCH 16/16] Tweak scale and sparsity goal --- docs/src/submodules/Nonlinear/overview.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/src/submodules/Nonlinear/overview.md b/docs/src/submodules/Nonlinear/overview.md index 1ab9cba736..a69e28d142 100644 --- a/docs/src/submodules/Nonlinear/overview.md +++ b/docs/src/submodules/Nonlinear/overview.md @@ -570,12 +570,11 @@ including `ReverseAD`-–for reverse-mode AD in Julia. `ReverseAD` exists becaus it has a different set of design goals. * **Goal: handle scale and sparsity** - The types of functions that MOI computes derivatives of have two key - characteristics: they can be very large scale (10^5 or more functions across - 10^5 or more variables) and they are very sparse. For large problems, it is - common for the hessian to have `O(n)` non-zero elements instead of `O(n^2)` - if it was dense. To the best of our knowledge, `ReverseAD` is the only - reverse-mode AD system in Julia that handles sparsity by default. + The types of nonlinear optimization problems that MOI represents can be large + scale (10^5 or more functions across 10^5 or more variables) with very sparse + derivatives. The ability to compute a sparse Hessian matrix is essential. To + the best of our knowledge, `ReverseAD` is the only reverse-mode AD system in + Julia that handles sparsity by default. * **Goal: limit the scope to improve robustness** Most other AD packages accept arbitrary Julia functions as input and then trace an expression graph using operator overloading. This means they must