-
Notifications
You must be signed in to change notification settings - Fork 45
Add RankUpdateEuclideanMetric #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
AdvancedHMC.jl documentation for PR #443 is available at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some comments about docstrings and tests and such, but would be good to also get a review from someone who understands the algorithm that is being implemented here.
Thanks @ErikQQY for pushing this forward! I'm pretty sure this implementation would work with Pathfinder (though we would probably need the expectations of the I'm happy to review, but it would first be nice to get an answer from the maintainers to this question, which I don't think has been answered yet and which I also raised in #411:
|
On a semi-related note, this rank-update covariance is the same as the marginal covariance of latent factor models of the following structure: There's a rich literature on algorithms for estimating |
IIUC, the main reason why the metrics in AHMC don't use PDMats.jl is that the introduction of PDMats.jl would introduce unnecessary overhead, for example, need some wrappers to store unnecessary matrices etc. But as for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ErikQQY! I have a few more small points, I'm admittedly being picky about the docs.
""" | ||
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric | ||
# Diagnal of the inverse of the mass matrix | ||
"Diagnal of the inverse of the mass matrix" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Diagnal of the inverse of the mass matrix" | |
"Diagonal of the inverse of the mass matrix" |
""" | ||
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric | ||
# Diagnal of the inverse of the mass matrix | ||
"Diagnal of the inverse of the mass matrix" | ||
M⁻¹::AM | ||
B::AB | ||
D::AD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D::AD | |
D::AD | |
"Woodbury factorisation of M⁻¹ + B D transpose(B)" |
@@ -99,25 +99,36 @@ function Base.show(io::IO, dem::DenseEuclideanMetric) | |||
end | |||
|
|||
""" | |||
RankUpdateEuclideanMetric{T,M} <: AbstractMetric | |||
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric | |
RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric |
B::AB | ||
D::AD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some nice explanation of what the purpose of B
and D
is, why we want this Woodbury factorisation, or how this is important for RankUpdateEuclideanMetric? If not, can just refer to somewhere else for an explanation, but would be good to say something about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen one place that explains it all. D
is not needed, since it can be absorbed into B
with any square-root factorization; it's just sometimes more convenient to have it (e.g. Pathfinder directly constructs it). On the other hand, when fitting a factor model, D
is usually constrained to be I
.
One interpretation comes from the factor model written in #443 (comment). It's useful when most of the structure of the covariance lies on a low-dimensional subspace. One such example is when the a small number of parameters are highly correlated with each other and not with any other parameters, and all remaining parameters are uncorrelated. Typically efficiency of HMC would be hindered by the high correlations, and one would need a dense covariance matrix, but then you end up with quadratic cost for each leapfrog stop. But if you can roughly upper bound the number of correlated parameters as << the number of sampled parameters, you can use a rank-update metric and get linear cost.
It's a mouthful, so I'm not certain the explanation belongs here. I think an explanation could wait until adaptation methods (e.g. Bales') are included here.
|
||
# References | ||
|
||
- Ben Bales, Arya Pourzanjani, Aki Vehtari, Linda Petzold, Selecting the Metric in Hamiltonian Monte Carlo, 2019 | ||
""" | ||
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More a question than a request: Is there a reason why AM
can be a Vector? Also, is it intentional that AB
and AD
don't have to have the same element type?
- Construct a Gaussian Euclidean metric of size `(n, n)` with `M⁻¹` being diagonal matrix. | ||
- Construct a Gaussian Euclidean metric of `M⁻¹`, where `M⁻¹` should be a full rank positive definite matrix, | ||
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other metrics M⁻¹
is defined as the inverse of the metric. Wouldn't it be more consistent to here define M⁻¹ = A + B D B'
as the inverse metric and require A be diagonal and positive-definite? (also, "full-rank" for diagonal PD-mat is redundant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Pathfinder to safely convert its WoodburyPDMat
to this type, we'll need the contents/form of the factorization
field to be documented.
end | ||
|
||
function woodbury_factorize(A, B, D) | ||
cholA = cholesky(A isa Diagonal ? A : Symmetric(A)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pathfinder's implementation allows for arbitrary PD A
but for the use-cases in Pathfinder and Bales's paper, diagonal A
is sufficient. Since you've already documented A
as diagonal, perhaps you can drop this check.
U = cholA.U | ||
Q, R = qr(U' \ B) | ||
V = cholesky(Symmetric(muladd(R, D * R', I))).U | ||
return (U=U, Q=Q, V=V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (U=U, Q=Q, V=V) | |
return (; U, Q, V) |
function RankUpdateEuclideanMetric(n::Int) | ||
M⁻¹ = Diagonal(ones(n)) | ||
B = zeros(n, 0) | ||
D = zeros(0, 0) | ||
factorization = woodbury_factorize(M⁻¹, B, D) | ||
return RankUpdateEuclideanMetric(M⁻¹, B, D, factorization) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is probably fine for now, but there are multiple ways to form the identity matrix here, and later it might be better to initialize a different way (e.g. for tuning a covariance matrix for factor analysis, B=0
and D=0
prohibits convergence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it breaks with the API of other metric constructors, but I think you want to be able to initialize the rank of the update as well. Otherwise you have no way to fix the rank for future tuning algorithms.
Base.size(metric::RankUpdateEuclideanMetric, dim...) = size(metric.M⁻¹.diag, dim...) | ||
|
||
function Base.show(io::IO, ::MIME"text/plain", metric::RankUpdateEuclideanMetric) | ||
return print(io, "RankUpdateEuclideanMetric(diag=$(diag(metric.M⁻¹)))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another example where calling the full-rank part of the inverse-metric M⁻¹
leads to a different diagonal being shown than the diagonal of the full inverse-metric. The diagonal entries needed can be efficiently computed, see https://github.com/mlcolab/Pathfinder.jl/blob/f4ca90dc3d91f077f479d13904a2b6bf99e8ee25/src/woodbury.jl#L326-L329
|
||
# References | ||
|
||
- Ben Bales, Arya Pourzanjani, Aki Vehtari, Linda Petzold, Selecting the Metric in Hamiltonian Monte Carlo, 2019 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The factorized form used here is due to the Pathfinder paper, so I recommend also citing http://jmlr.org/papers/v23/21-0889.html
|
||
- Construct a Gaussian Euclidean metric of size `(n, n)` with `M⁻¹` being diagonal matrix. | ||
- Construct a Gaussian Euclidean metric of `M⁻¹`, where `M⁻¹` should be a full rank positive definite matrix, | ||
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite. | |
and `B` `D` must be chosen so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite. |
""" | ||
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric | ||
|
||
A Gaussian Euclidean metric whose inverse is constructed by rank-updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Gaussian Euclidean metric whose inverse is constructed by rank-updates. | |
A Gaussian Euclidean metric whose inverse is constructed by low-rank updates to a diagonal matrix. |
h::Hamiltonian{<:RankUpdateEuclideanMetric,<:GaussianKinetic}, r::T, θ::T | ||
) where {T<:AbstractVecOrMat} | ||
M⁻¹ = h.metric.M⁻¹ | ||
return -r' * M⁻¹ * r / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that because you've defined M
as just the diagonal part, this equation is incorrect. The full one is https://github.com/mlcolab/Pathfinder.jl/blob/f4ca90dc3d91f077f479d13904a2b6bf99e8ee25/src/integration/advancedhmc.jl#L82, which uses https://github.com/mlcolab/Pathfinder.jl/blob/f4ca90dc3d91f077f479d13904a2b6bf99e8ee25/src/woodbury.jl#L384-L388
Fix: #411
Part of #277
This PR ports
RankUpdateEuclideanMetric
from Pathfinder.jl, but don't depend on PDMats.jl, see why not use PDMats.jl in #34.With this metric in, we can take a step forward low-rank metric adaption, and Pathfinder.jl can directly use this metric from AHMC without unsafe overloading.