Skip to content

Conversation

@dellaert
Copy link
Member

@dellaert dellaert commented Nov 19, 2025

WIP: a new non-linear factor that collects many smaller factors but stores them with one malloc, and also creates a large single-malloc Jacobian factor.

Did some more work, including trying to linearize to hessian, and I ran on a larger problem. Verdict: whatever we gain in linearization dwarfs compared to slower elimination:
image

Raw Data

[100%] Built target timeSFMBAL
Optimizing Regular Graph...
Initial error: 4.18566e+06, values: 22122
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0       101682      4.1e+06     0.0001      1      0.013       0.27        0.3
   1      5.3e+04      4.9e+04    3.3e-05      1      0.012       0.26       0.29
   2      1.9e+04      3.4e+04    3.3e-05      1      0.012       0.26       0.29
   3      1.8e+04      1.1e+03    1.1e-05      1      0.014       0.26       0.29
   4      1.8e+04           87    3.8e-06      1       0.02       0.26       0.29
   5      1.8e+04          0.2    1.3e-06      1      0.014       0.26       0.29
   6      1.8e+04      6.6e-05    4.2e-07      1      0.012       0.26       0.29
-Total: 0 CPU (0 times, 0 wall, 2.7 children, min: 0 max: 0)
|   -regular: 0 CPU (0 times, 0 wall, 2.7 children, min: 0 max: 0)
|   |   -optimize: 2.7 CPU (1 times, 2.2 wall, 2.7 children, min: 2.7 max: 2.7)
Optimizing Regular Graph (METIS ordering)...
Initial error: 4.2e+06, values: 22122
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0        1e+05      4.1e+06     0.0001      1      0.011       0.27       0.29
   1      5.3e+04      4.9e+04    3.3e-05      1      0.011       0.26       0.29
   2      1.9e+04      3.4e+04    3.3e-05      1      0.011       0.26       0.28
   3      1.8e+04      1.1e+03    1.1e-05      1      0.011       0.26       0.28
   4      1.8e+04           87    3.8e-06      1      0.011       0.26       0.29
   5      1.8e+04          0.2    1.3e-06      1      0.012       0.26       0.28
   6      1.8e+04      6.6e-05    4.2e-07      1      0.011       0.25       0.28
-Total: 0 CPU (0 times, 0 wall, 2.7 children, min: 0 max: 0)
|   -regular metis: 0 CPU (0 times, 0 wall, 2.7 children, min: 0 max: 0)
|   |   -optimize: 2.7 CPU (1 times, 2.2 wall, 2.7 children, min: 2.7 max: 2.7)
Optimizing Batch Graph...
Initial error: 4.2e+06, values: 22122
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0        1e+05      4.1e+06     0.0001      1     0.0072       0.29       0.33
   1      5.3e+04      4.9e+04    3.3e-05      1     0.0069       0.29       0.34
   2      1.9e+04      3.4e+04    3.3e-05      1     0.0071       0.29       0.33
   3      1.8e+04      1.1e+03    1.1e-05      1     0.0069       0.29       0.33
   4      1.8e+04           87    3.8e-06      1     0.0075       0.29       0.33
   5      1.8e+04          0.2    1.3e-06      1     0.0066       0.29       0.33
   6      1.8e+04      6.6e-05    4.2e-07      1     0.0065       0.29       0.34
-Total: 0 CPU (0 times, 0 wall, 2.8 children, min: 0 max: 0)
|   -batch: 0 CPU (0 times, 0 wall, 2.8 children, min: 0 max: 0)
|   |   -optimize: 2.8 CPU (1 times, 2.5 wall, 2.8 children, min: 2.8 max: 2.8)
Optimizing Batch (Hessian) Graph...
Initial error: 4.2e+06, values: 22122
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0        1e+05      4.1e+06     0.0001      1      0.033       0.28       0.35
   1      5.3e+04      4.9e+04    3.3e-05      1      0.026       0.27       0.32
   2      1.9e+04      3.4e+04    3.3e-05      1      0.026       0.28       0.33
   3      1.8e+04      1.1e+03    1.1e-05      1      0.026       0.26       0.32
   4      1.8e+04           87    3.8e-06      1      0.023       0.26       0.32
   5      1.8e+04          0.2    1.3e-06      1       0.03       0.26       0.32
   6      1.8e+04      6.6e-05    4.2e-07      1      0.025       0.27       0.32
-Total: 0 CPU (0 times, 0 wall, 3.6 children, min: 0 max: 0)
|   -batch hessian: 0 CPU (0 times, 0 wall, 3.6 children, min: 0 max: 0)
|   |   -optimize: 3.6 CPU (1 times, 2.5 wall, 3.6 children, min: 3.6 max: 3.6)
[100%] Built target timeSFMBAL.run
(gtsfm-v1) dellaert@ipsec-10-2-224-104 ~/git/github/build$ timing/timeSFMBAL ~/Downloads/problem-88-64298-pre.txt 
Optimizing Regular Graph...
Initial error: 1.3906e+12, values: 64386
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0  2.82082e+27     -2.8e+27     0.0001      1      0.062        1.8          2
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      8.9e+34     -8.9e+34     0.0002      1      0.062        1.5        3.4
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      9.2e+37     -9.2e+37     0.0008      1      0.062        1.4        4.9
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      6.5e+21     -6.5e+21     0.0064      1      0.062        1.4        6.3
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      3.8e+21     -3.8e+21        0.1      1      0.062        1.4        7.8
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.7e+15     -1.7e+15        3.3      1      0.062        1.4        9.2
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      4.3e+31     -4.3e+31    2.1e+02      1      0.062        1.5         11
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.4e+12      9.2e+08    2.7e+04      1      0.062        1.4         12
   1      4.7e+22     -4.7e+22    8.9e+03      1      0.052        1.4        1.5
   1      1.2e+18     -1.2e+18    4.6e+06      1      0.052        1.5          3
   1      1.4e+12      5.3e+03    4.7e+09      1      0.052        1.4        4.5
-Total: 0 CPU (0 times, 0 wall, 17 children, min: 0 max: 0)
|   -regular: 0 CPU (0 times, 0 wall, 17 children, min: 0 max: 0)
|   |   -optimize: 17 CPU (1 times, 17 wall, 17 children, min: 17 max: 17)
Optimizing Regular Graph (METIS ordering)...
Initial error: 1.4e+12, values: 64386
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      2.8e+27     -2.8e+27     0.0001      1      0.051        1.5        1.6
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      8.9e+34     -8.9e+34     0.0002      1      0.051        1.4          3
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      9.2e+37     -9.2e+37     0.0008      1      0.051        1.4        4.5
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      6.5e+21     -6.5e+21     0.0064      1      0.051        1.5        5.9
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      3.8e+21     -3.8e+21        0.1      1      0.051        1.8        7.7
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.7e+15     -1.7e+15        3.3      1      0.051        1.4        9.2
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      4.3e+31     -4.3e+31    2.1e+02      1      0.051        1.4         11
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.4e+12      9.2e+08    2.7e+04      1      0.051        1.4         12
   1      4.7e+22     -4.7e+22    8.9e+03      1      0.053        1.4        1.5
   1      1.2e+18     -1.2e+18    4.6e+06      1      0.053        1.4        2.9
   1      1.4e+12      5.3e+03    4.7e+09      1      0.053        1.4        4.4
-Total: 0 CPU (0 times, 0 wall, 17 children, min: 0 max: 0)
|   -regular metis: 0 CPU (0 times, 0 wall, 17 children, min: 0 max: 0)
|   |   -optimize: 17 CPU (1 times, 17 wall, 17 children, min: 17 max: 17)
Optimizing Batch Graph...
Initial error: 1.4e+12, values: 64386
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      2.8e+27     -2.8e+27     0.0001      1      0.047        2.7          3
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      8.9e+34     -8.9e+34     0.0002      1      0.047        2.6        5.6
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      9.2e+37     -9.2e+37     0.0008      1      0.047        2.5        8.2
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      6.5e+21     -6.5e+21     0.0064      1      0.047        2.7         11
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      3.8e+21     -3.8e+21        0.1      1      0.047        2.6         13
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.7e+15     -1.7e+15        3.3      1      0.047        2.7         16
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      4.3e+31     -4.3e+31    2.1e+02      1      0.047        2.6         19
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.4e+12      9.2e+08    2.7e+04      1      0.047        2.5         21
   1      4.7e+22     -4.7e+22    8.9e+03      1      0.042        2.7        2.9
   1      1.2e+18     -1.2e+18    4.6e+06      1      0.042        2.5        5.5
   1      1.4e+12      5.3e+03    4.7e+09      1      0.042        2.6        8.1
-Total: 0 CPU (0 times, 0 wall, 30 children, min: 0 max: 0)
|   -batch: 0 CPU (0 times, 0 wall, 30 children, min: 0 max: 0)
|   |   -optimize: 30 CPU (1 times, 30 wall, 30 children, min: 30 max: 30)
Optimizing Batch (Hessian) Graph...
Initial error: 1.4e+12, values: 64386
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      2.8e+27     -2.8e+27     0.0001      1       0.53        4.9        6.3
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      8.9e+34     -8.9e+34     0.0002      1       0.53        4.6         11
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      9.2e+37     -9.2e+37     0.0008      1       0.53        3.9         15
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      6.5e+21     -6.5e+21     0.0064      1       0.53          3         18
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      3.8e+21     -3.8e+21        0.1      1       0.53        3.7         22
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.7e+15     -1.7e+15        3.3      1       0.53        3.7         25
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      4.3e+31     -4.3e+31    2.1e+02      1       0.53        3.3         29
iter      cost      cost_change    lambda  success lin_time  solve_time total_time
   0      1.4e+12      9.2e+08    2.7e+04      1       0.53        3.9         33
   1      4.7e+22     -4.7e+22    8.9e+03      1       0.39        3.9        4.8
   1      1.2e+18     -1.2e+18    4.6e+06      1       0.39        3.6        8.4
   1      1.4e+12      5.3e+03    4.7e+09      1       0.39        4.5         13
-Total: 0 CPU (0 times, 0 wall, 40 children, min: 0 max: 0)
|   -batch hessian: 0 CPU (0 times, 0 wall, 40 children, min: 0 max: 0)
|   |   -optimize: 40 CPU (1 times, 46 wall, 40 children, min: 40 max: 40)

@dellaert dellaert requested a review from Copilot November 19, 2025 06:34
Copilot finished reviewing on behalf of dellaert November 19, 2025 06:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new BatchFactor class that aggregates multiple identical factors into a single container, optimizing memory allocation through single malloc operations and creating large single-malloc Jacobian factors during linearization. This is designed to improve performance for Structure-from-Motion and SLAM applications with many similar factors.

Key Changes

  • New BatchFactor template class that wraps collections of factors
  • Map-based constructors for convenient factor batching
  • Optimized linearization that produces a single JacobianFactor
  • Test suite demonstrating usage with ProjectionFactor and BetweenFactor
  • Timing example in timeSFMBAL.cpp comparing regular and batched approaches

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File Description
gtsam/nonlinear/BatchFactor.h New header file defining the BatchFactor template class with constructors, linearization, and helper methods
gtsam/nonlinear/tests/testBatchFactor.cpp New test file with three test cases covering different constructor patterns and factor types
timing/timeSFMBAL.cpp Modified to add BatchFactor usage example alongside existing conventional factor graph approach

@dellaert
Copy link
Member Author

@ProfFan FYI

Compare with Metis
Dramatically simplified Hessian path (and way faster!)
Add linearize time
Hessian!
Kill "fast" path
revert that
More micro-optimization
ScratchMatrix
offsets_
added a Value cache
feat: Optimize BatchFactor linearization by enabling zero-malloc Jacobian computation via OptionalJacobian stride support.
refactor: optimize BatchFactor key and dimension indexing.
refactor: cache factor-specific key indices in BatchFactor for improved linearization and enable LM sequential Cholesky solver with summary verbosity.
refactor: Pre-calculate key dimensions and indices in BatchFactor, add vector constructors, and enable LM parameters in timeSFMBAL.
Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants