You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/optimization_packages/optimization.md
+20-20
Original file line number
Diff line number
Diff line change
@@ -4,28 +4,28 @@ There are some solvers that are available in the Optimization.jl package directl
4
4
5
5
## Methods
6
6
7
-
-`LBFGS`: The popular quasi-Newton method that leverages limited memory BFGS approximation of the inverse of the Hessian. Through a wrapper over the [L-BFGS-B](https://users.iems.northwestern.edu/%7Enocedal/lbfgsb.html) fortran routine accessed from the [LBFGSB.jl](https://github.com/Gnimuc/LBFGSB.jl/) package. It directly supports box-constraints.
8
-
9
-
This can also handle arbitrary non-linear constraints through a Augmented Lagrangian method with bounds constraints described in 17.4 of Numerical Optimization by Nocedal and Wright. Thus serving as a general-purpose nonlinear optimization solver available directly in Optimization.jl.
7
+
-`LBFGS`: The popular quasi-Newton method that leverages limited memory BFGS approximation of the inverse of the Hessian. Through a wrapper over the [L-BFGS-B](https://users.iems.northwestern.edu/%7Enocedal/lbfgsb.html) fortran routine accessed from the [LBFGSB.jl](https://github.com/Gnimuc/LBFGSB.jl/) package. It directly supports box-constraints.
8
+
9
+
This can also handle arbitrary non-linear constraints through a Augmented Lagrangian method with bounds constraints described in 17.4 of Numerical Optimization by Nocedal and Wright. Thus serving as a general-purpose nonlinear optimization solver available directly in Optimization.jl.
10
10
11
-
-`Sophia`: Based on the recent paper https://arxiv.org/abs/2305.14342. It incorporates second order information in the form of the diagonal of the Hessian matrix hence avoiding the need to compute the complete hessian. It has been shown to converge faster than other first order methods such as Adam and SGD.
11
+
-`Sophia`: Based on the recent paper https://arxiv.org/abs/2305.14342. It incorporates second order information in the form of the diagonal of the Hessian matrix hence avoiding the need to compute the complete hessian. It has been shown to converge faster than other first order methods such as Adam and SGD.
12
+
13
+
+`solve(problem, Sophia(; η, βs, ϵ, λ, k, ρ))`
12
14
13
-
+`solve(problem, Sophia(; η, βs, ϵ, λ, k, ρ))`
14
-
15
-
+`η` is the learning rate
16
-
+`βs` are the decay of momentums
17
-
+`ϵ` is the epsilon value
18
-
+`λ` is the weight decay parameter
19
-
+`k` is the number of iterations to re-compute the diagonal of the Hessian matrix
20
-
+`ρ` is the momentum
21
-
+ Defaults:
22
-
23
-
*`η = 0.001`
24
-
*`βs = (0.9, 0.999)`
25
-
*`ϵ = 1e-8`
26
-
*`λ = 0.1`
27
-
*`k = 10`
28
-
*`ρ = 0.04`
15
+
+`η` is the learning rate
16
+
+`βs` are the decay of momentums
17
+
+`ϵ` is the epsilon value
18
+
+`λ` is the weight decay parameter
19
+
+`k` is the number of iterations to re-compute the diagonal of the Hessian matrix
0 commit comments