Skip to content

A collection of Newton-type optimization algorithms implemented in Julia.

License

RS-Coop/QuasiNewton.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuasiNewton.jl

A collection of quasi-Newton optimization algorithms.

DISCLAIMER: This package is still very much a work in progress and certainly not everything has been tested thoroughly. For example, the AD functionality has only been tested (in a limited capacity) with the Enzyme.jl backend.

License & Citation

All source code is made available under an MIT license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE for the full text.

This repository can be cited using the GitHub action in the sidebar, or using the metadata in CITATION.cff. See Publications for a full list of publications related to R-SFN and influencing this package. If any of these are useful to your own work, please cite them individually.

Contributing

Feel free to open issues, ask questions, or otherwise contribute!

Methods

Our use of the term quasi should not be taken in the traditional sense, i.e. only in reference to methods such as BFGS, but instead via its true definition to describe algorithms that are similar to or resemble Newton's method. As such, our algorithms are intended for second-order unconstrained convex or non-convex optimization, i.e. we consider problems of the following form

$$\min_{\mathbf{x}\in \mathbb{R}^n}f(\mathbf{x})$$

where $f:\mathbb{R}^n\to\mathbb{R}$ is a twice continuously differentiable function, possibly with a Lipschitz continuous Hessian. We then consider updates of the following form:

$$\mathbf{x}_{(k+1)} = \mathbf{x}_{(k)} - \eta_{(k)}\mathbf{B}_{(k)}^{-1}\nabla f(\mathbf{x}_{(k)})$$

where $\mathbf{B}_{(k)}$ is some matrix constructed as a function of the Hessian $\mathbf{H}_{(k)} = \nabla^2 f(\mathbf{x}_{(k)})$, recovering different algorithms.

All algorithms are implemented matrix-free using Krylov subspace methods to compute the update and can leverage mixed-mode automatic differentiation to compute cheap Hessian-vector products.

(Regularized) Newton

$$\mathbf{B}_{(k)} = \mathbf{H}_{(k)} + \lambda_{(k)}\mathbf{I}$$

where choosing $\lambda_{(k)}=0$ recovers vanilla Newton's method and $\lambda_{(k)}\propto\|\nabla f(\mathbf{x}_{(k)})\|^{1/2}$ recovers the regularized Newton's method.

Regularized Saddle-Free Newton (R-SFN)

$$\mathbf{B}_{(k)} = \left(\mathbf{H}_{(k)}^2 + \lambda_{(k)}\mathbf{I}\right)^{1/2}$$

where the regularization term is $\lambda_{(k)}\propto\|\nabla f(\mathbf{x}_{(k)})\|$. The matrix function in question is a smooth approximation to the absolute value and is computed via the Lanczos process. See Publications for more details.

Adaptive Regularization with Cubics (ARC)

$$\mathbf{B}_{(k)} = \mathbf{H}_{(k)} + \lambda_{(k)}\mathbf{I}$$

where the analytic choice of regularization is $\lambda_{(k)}\propto\|\mathbf{x}_{(k+1)} - \mathbf{x}_{(k)}\|$, but in practice this is chosen adaptively. This particular implementation is based of ARCqK.

Installation

This package can be installed just like any other Julia package. From the terminal, after starting the Julia REPL, run the following:

using Pkg
Pkg.add("QuasiNewton")

which will install the package and its direct dependencies. For automatic differentiation (AD), we use DifferentiationInterface.jl, which is agnostic to the backend. Backends such as Enzyme.jl or Zygote.jl can be installed by the user and specified in the optimization call. See JuliaDiff for current details on the Julia AD ecosystem.

Testing

To test the package, run the following command in the REPL:

using Pkg
Pkg.test(test_args=[<"optional specific tests">], coverage=true)

Usage

Loading the package as usual

using QuasiNewton

will export four in-place optimization routines: optimize!, newton!, rsfn!, and arc!. We will only cover the details of the first, as it covers the others as well. The optimization algorithm can be specified via a symbol and depending on the function information provided, the method is dispatched to an AD or LinearOperator.jl based execution. All methods return a Stats object containing relevant information about the procedure, with some information only being stored if history=true.

First, we consider the following example of minimizing a simple least-squares problem with Newton's method.

using LinearAlgebra #for norm
using LinearOperators #for Hessian

d = 100 #problem dimension

A = randn(d,d) #random data
b = A*randn(d) #random rhs

f(x) = 0.5*norm(A*x-b)^2 #objective function
function fg!(g, x) #computes gradient in-place and returns objective function
	g .= A'*(A*x - b)
	return f(x)
end
H(x) = LinearOperator(A'*A) #computes Hessian

stats = optimize!(randn(d), f, fg!, H, Val(:newton);
					itmax=100,
					time_limit=60,
					posdef=true,
					M=0.0)

show(stats)
#=
Converged:                    true
Iterations:                      1
Function Evals:                  3
Hvp Evals:                     151
Run Time (s):             3.59e-02
Minimum:                 1.600e-12
Gradient Norm:           2.259e-05
Max/Avg. Residual Norm:  0.000e+00, 0.000e+00
Max/Avg. Regularization: 0.000e+00, 0.000e+00
Avg. Krylov Iterations:  1.510e+02
Status:  
=#

Second, we consider the following example of minimizing the non-convex Rosenbrock function with R-SFN.

using DifferentiationInterface #for AD
import Enzyme #for AD

function rosenbrock(x) #objective function
	res = 0.0
	for i = 1:size(x,1)-1
		res += 100*(x[i+1]-x[i]^2)^2 + (1-x[i])^2
	end

	return res
end

stats = optimize!(zeros(10), rosenbrock, Val(:rsfn), AutoEnzyme();
					itmax=100,
					time_limit=60,
					M=1e-8)

show(stats)
#=
Converged:                    true
Iterations:                     73
Function Evals:                 74
Hvp Evals:                     353
Run Time (s):             3.30e-04
Minimum:                 6.415e-12
Gradient Norm:           1.458e-05
Max/Avg. Residual Norm:  7.101e+00, 5.267e-01
Max/Avg. Regularization: 1.288e+02, 1.264e+01
Avg. Krylov Iterations:  4.836e+00
Status:
=#

Publications

More information about R-SFN in can be found in the following publications. Please cite them if they are useful for your own work!

@mastersthesis{rsfn,
	title = {{Regularized Saddle-Free Newton: Saddle Avoidance and Efficient Implementation}},
	author = {Cooper Simpson},
	school = {Dept. of Applied Mathematics, CU Boulder},
	year = {2022},
	type = {{M.S.} Thesis}
}

About

A collection of Newton-type optimization algorithms implemented in Julia.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages