A collection of quasi-Newton optimization algorithms.
Author: Cooper Simpson
DISCLAIMER: This package is still very much a work in progress and certainly not everything has been tested thoroughly. For example, the AD functionality has only been tested (in a limited capacity) with the Enzyme.jl backend.
All source code is made available under an MIT license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE for the full text.
This repository can be cited using the GitHub action in the sidebar, or using the metadata in CITATION.cff. See Publications for a full list of publications related to R-SFN and influencing this package. If any of these are useful to your own work, please cite them individually.
Feel free to open issues, ask questions, or otherwise contribute!
Our use of the term quasi should not be taken in the traditional sense, i.e. only in reference to methods such as BFGS, but instead via its true definition to describe algorithms that are similar to or resemble Newton's method. As such, our algorithms are intended for second-order unconstrained convex or non-convex optimization, i.e. we consider problems of the following form
where
where
All algorithms are implemented matrix-free using Krylov subspace methods to compute the update and can leverage mixed-mode automatic differentiation to compute cheap Hessian-vector products.
where choosing
where the regularization term is
where the analytic choice of regularization is
This package can be installed just like any other Julia package. From the terminal, after starting the Julia REPL, run the following:
using Pkg
Pkg.add("QuasiNewton")which will install the package and its direct dependencies. For automatic differentiation (AD), we use DifferentiationInterface.jl, which is agnostic to the backend. Backends such as Enzyme.jl or Zygote.jl can be installed by the user and specified in the optimization call. See JuliaDiff for current details on the Julia AD ecosystem.
To test the package, run the following command in the REPL:
using Pkg
Pkg.test(test_args=[<"optional specific tests">], coverage=true)Loading the package as usual
using QuasiNewtonwill export four in-place optimization routines: optimize!, newton!, rsfn!, and arc!. We will only cover the details of the first, as it covers the others as well. The optimization algorithm can be specified via a symbol and depending on the function information provided, the method is dispatched to an AD or LinearOperator.jl based execution. All methods return a Stats object containing relevant information about the procedure, with some information only being stored if history=true.
First, we consider the following example of minimizing a simple least-squares problem with Newton's method.
using LinearAlgebra #for norm
using LinearOperators #for Hessian
d = 100 #problem dimension
A = randn(d,d) #random data
b = A*randn(d) #random rhs
f(x) = 0.5*norm(A*x-b)^2 #objective function
function fg!(g, x) #computes gradient in-place and returns objective function
g .= A'*(A*x - b)
return f(x)
end
H(x) = LinearOperator(A'*A) #computes Hessian
stats = optimize!(randn(d), f, fg!, H, Val(:newton);
itmax=100,
time_limit=60,
posdef=true,
M=0.0)
show(stats)
#=
Converged: true
Iterations: 1
Function Evals: 3
Hvp Evals: 151
Run Time (s): 3.59e-02
Minimum: 1.600e-12
Gradient Norm: 2.259e-05
Max/Avg. Residual Norm: 0.000e+00, 0.000e+00
Max/Avg. Regularization: 0.000e+00, 0.000e+00
Avg. Krylov Iterations: 1.510e+02
Status:
=#Second, we consider the following example of minimizing the non-convex Rosenbrock function with R-SFN.
using DifferentiationInterface #for AD
import Enzyme #for AD
function rosenbrock(x) #objective function
res = 0.0
for i = 1:size(x,1)-1
res += 100*(x[i+1]-x[i]^2)^2 + (1-x[i])^2
end
return res
end
stats = optimize!(zeros(10), rosenbrock, Val(:rsfn), AutoEnzyme();
itmax=100,
time_limit=60,
M=1e-8)
show(stats)
#=
Converged: true
Iterations: 73
Function Evals: 74
Hvp Evals: 353
Run Time (s): 3.30e-04
Minimum: 6.415e-12
Gradient Norm: 1.458e-05
Max/Avg. Residual Norm: 7.101e+00, 5.267e-01
Max/Avg. Regularization: 1.288e+02, 1.264e+01
Avg. Krylov Iterations: 4.836e+00
Status:
=#More information about R-SFN in can be found in the following publications. Please cite them if they are useful for your own work!
@mastersthesis{rsfn,
title = {{Regularized Saddle-Free Newton: Saddle Avoidance and Efficient Implementation}},
author = {Cooper Simpson},
school = {Dept. of Applied Mathematics, CU Boulder},
year = {2022},
type = {{M.S.} Thesis}
}