Skip to content

Add RankUpdateEuclideanMetric #443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

ErikQQY
Copy link
Collaborator

@ErikQQY ErikQQY commented May 9, 2025

Fix: #411
Part of #277

This PR ports RankUpdateEuclideanMetric from Pathfinder.jl, but don't depend on PDMats.jl, see why not use PDMats.jl in #34.

With this metric in, we can take a step forward low-rank metric adaption, and Pathfinder.jl can directly use this metric from AHMC without unsafe overloading.

@ErikQQY ErikQQY requested review from yebai and sethaxen May 9, 2025 10:02
Copy link
Contributor

github-actions bot commented May 9, 2025

AdvancedHMC.jl documentation for PR #443 is available at:
https://TuringLang.github.io/AdvancedHMC.jl/previews/PR443/

@yebai yebai requested a review from mhauru May 11, 2025 19:05
Copy link
Member

@mhauru mhauru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some comments about docstrings and tests and such, but would be good to also get a review from someone who understands the algorithm that is being implemented here.

@sethaxen
Copy link
Member

Thanks @ErikQQY for pushing this forward! I'm pretty sure this implementation would work with Pathfinder (though we would probably need the expectations of the factorization field to be documented to safely use it with Pathfinder).

I'm happy to review, but it would first be nice to get an answer from the maintainers to this question, which I don't think has been answered yet and which I also raised in #411:

This PR ports RankUpdateEuclideanMetric from Pathfinder.jl, but don't depend on PDMats.jl, see why not use PDMats.jl in #34.

@sethaxen
Copy link
Member

sethaxen commented May 12, 2025

On a semi-related note, this rank-update covariance is the same as the marginal covariance of latent factor models of the following structure:

$$\begin{aligned} z &\sim \mathcal{N}(0, D)\\\ y &\sim \mathcal{N}(0, A)\\\ x &= \mu + y + B z \implies x \sim \mathcal{N}(\mu, A + B D B^\top) \end{aligned}$$

There's a rich literature on algorithms for estimating $(A,B,D)$ from draws that we could draw on to implement efficient mass matrix adaptation. I'm working on experiments to see how well this works and how it compares to Pathfinder and to the methods in https://arxiv.org/abs/1905.11916 and whether some combination of these methods outperforms either in isolation.

@ErikQQY
Copy link
Collaborator Author

ErikQQY commented May 14, 2025

I'm happy to review, but it would first be nice to get an answer from the maintainers to this question, which I don't think has been answered yet, and which I also raised in #411:

IIUC, the main reason why the metrics in AHMC don't use PDMats.jl is that the introduction of PDMats.jl would introduce unnecessary overhead, for example, need some wrappers to store unnecessary matrices etc. But as for the RankUpdateEuclideanMetric, I think PDMats.jl is suitable here, especially with the invwhite and invunwhitten functionality in JuliaStats/PDMats.jl#221.

@ErikQQY ErikQQY requested a review from mhauru May 15, 2025 19:55
Copy link
Member

@mhauru mhauru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ErikQQY! I have a few more small points, I'm admittedly being picky about the docs.

"""
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric
# Diagnal of the inverse of the mass matrix
"Diagnal of the inverse of the mass matrix"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Diagnal of the inverse of the mass matrix"
"Diagonal of the inverse of the mass matrix"

"""
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric
# Diagnal of the inverse of the mass matrix
"Diagnal of the inverse of the mass matrix"
M⁻¹::AM
B::AB
D::AD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D::AD
D::AD
"Woodbury factorisation of M⁻¹ + B D transpose(B)"

@@ -99,25 +99,36 @@ function Base.show(io::IO, dem::DenseEuclideanMetric)
end

"""
RankUpdateEuclideanMetric{T,M} <: AbstractMetric
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric
RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric

Comment on lines 133 to 134
B::AB
D::AD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some nice explanation of what the purpose of B and D is, why we want this Woodbury factorisation, or how this is important for RankUpdateEuclideanMetric? If not, can just refer to somewhere else for an explanation, but would be good to say something about them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen one place that explains it all. D is not needed, since it can be absorbed into B with any square-root factorization; it's just sometimes more convenient to have it (e.g. Pathfinder directly constructs it). On the other hand, when fitting a factor model, D is usually constrained to be I.

One interpretation comes from the factor model written in #443 (comment). It's useful when most of the structure of the covariance lies on a low-dimensional subspace. One such example is when the a small number of parameters are highly correlated with each other and not with any other parameters, and all remaining parameters are uncorrelated. Typically efficiency of HMC would be hindered by the high correlations, and one would need a dense covariance matrix, but then you end up with quadratic cost for each leapfrog stop. But if you can roughly upper bound the number of correlated parameters as << the number of sampled parameters, you can use a rank-update metric and get linear cost.

It's a mouthful, so I'm not certain the explanation belongs here. I think an explanation could wait until adaptation methods (e.g. Bales') are included here.


# References

- Ben Bales, Arya Pourzanjani, Aki Vehtari, Linda Petzold, Selecting the Metric in Hamiltonian Monte Carlo, 2019
"""
struct RankUpdateEuclideanMetric{T,AM<:AbstractVecOrMat{T},AB,AD,F} <: AbstractMetric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a question than a request: Is there a reason why AM can be a Vector? Also, is it intentional that AB and AD don't have to have the same element type?

Comment on lines +115 to +117
- Construct a Gaussian Euclidean metric of size `(n, n)` with `M⁻¹` being diagonal matrix.
- Construct a Gaussian Euclidean metric of `M⁻¹`, where `M⁻¹` should be a full rank positive definite matrix,
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other metrics M⁻¹ is defined as the inverse of the metric. Wouldn't it be more consistent to here define M⁻¹ = A + B D B' as the inverse metric and require A be diagonal and positive-definite? (also, "full-rank" for diagonal PD-mat is redundant).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Pathfinder to safely convert its WoodburyPDMat to this type, we'll need the contents/form of the factorization field to be documented.

end

function woodbury_factorize(A, B, D)
cholA = cholesky(A isa Diagonal ? A : Symmetric(A))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pathfinder's implementation allows for arbitrary PD A but for the use-cases in Pathfinder and Bales's paper, diagonal A is sufficient. Since you've already documented A as diagonal, perhaps you can drop this check.

U = cholA.U
Q, R = qr(U' \ B)
V = cholesky(Symmetric(muladd(R, D * R', I))).U
return (U=U, Q=Q, V=V)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (U=U, Q=Q, V=V)
return (; U, Q, V)

Comment on lines +146 to +152
function RankUpdateEuclideanMetric(n::Int)
M⁻¹ = Diagonal(ones(n))
B = zeros(n, 0)
D = zeros(0, 0)
factorization = woodbury_factorize(M⁻¹, B, D)
return RankUpdateEuclideanMetric(M⁻¹, B, D, factorization)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is probably fine for now, but there are multiple ways to form the identity matrix here, and later it might be better to initialize a different way (e.g. for tuning a covariance matrix for factor analysis, B=0 and D=0 prohibits convergence)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it breaks with the API of other metric constructors, but I think you want to be able to initialize the rank of the update as well. Otherwise you have no way to fix the rank for future tuning algorithms.

Base.size(metric::RankUpdateEuclideanMetric, dim...) = size(metric.M⁻¹.diag, dim...)

function Base.show(io::IO, ::MIME"text/plain", metric::RankUpdateEuclideanMetric)
return print(io, "RankUpdateEuclideanMetric(diag=$(diag(metric.M⁻¹)))")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another example where calling the full-rank part of the inverse-metric M⁻¹ leads to a different diagonal being shown than the diagonal of the full inverse-metric. The diagonal entries needed can be efficiently computed, see https://github.com/mlcolab/Pathfinder.jl/blob/f4ca90dc3d91f077f479d13904a2b6bf99e8ee25/src/woodbury.jl#L326-L329


# References

- Ben Bales, Arya Pourzanjani, Aki Vehtari, Linda Petzold, Selecting the Metric in Hamiltonian Monte Carlo, 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factorized form used here is due to the Pathfinder paper, so I recommend also citing http://jmlr.org/papers/v23/21-0889.html


- Construct a Gaussian Euclidean metric of size `(n, n)` with `M⁻¹` being diagonal matrix.
- Construct a Gaussian Euclidean metric of `M⁻¹`, where `M⁻¹` should be a full rank positive definite matrix,
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and `B` `D` must be chose so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite.
and `B` `D` must be chosen so that the Woodbury matrix `W = M⁻¹ + B D B^\\mathrm{T}` is positive definite.

"""
RankUpdateEuclideanMetric{T,AM,AB,AD,F} <: AbstractMetric

A Gaussian Euclidean metric whose inverse is constructed by rank-updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A Gaussian Euclidean metric whose inverse is constructed by rank-updates.
A Gaussian Euclidean metric whose inverse is constructed by low-rank updates to a diagonal matrix.

h::Hamiltonian{<:RankUpdateEuclideanMetric,<:GaussianKinetic}, r::T, θ::T
) where {T<:AbstractVecOrMat}
M⁻¹ = h.metric.M⁻¹
return -r' * M⁻¹ * r / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making rand_momentum part of the API
3 participants