Significant runtime difference between CERES and GTSAM on same BAL problem

### Description:
I noticed a large performance discrepancy between the bundle_adjuster binary and the SFMExample_bal example when running on the same BAL dataset.

### Dataset:
problem-16-22106-pre.txt from https://grail.cs.washington.edu/projects/bal/dubrovnik.html
(Problem size: 16 22106 83718)

### Version
96e53555e6ee115f4bff5d471ddef3d3c80ca935

### Build command:

cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

### Timing

**ceres**
```
root@nixos:/tinynav/output/ceres-solver/build# time ./bin/bundle_adjuster --input /tinynav/problem-16-22106-pre.txt -num_threads=1 -linear_solver=dense_schur
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  4.185660e+06    0.00e+00    2.16e+07   0.00e+00   0.00e+00  1.00e+04        0    4.42e-02    7.22e-02
   1  1.980525e+05    3.99e+06    5.34e+06   0.00e+00   9.60e-01  3.00e+04        1    8.01e-02    1.52e-01
   2  5.086543e+04    1.47e+05    2.11e+06   1.01e+03   8.22e-01  4.09e+04        1    7.46e-02    2.27e-01
   3  1.859667e+04    3.23e+04    2.87e+05   2.64e+02   9.85e-01  1.23e+05        1    7.48e-02    3.02e-01
   4  1.803857e+04    5.58e+02    2.69e+04   8.66e+01   9.93e-01  3.69e+05        1    7.44e-02    3.76e-01
   5  1.803391e+04    4.66e+00    3.11e+02   1.02e+01   1.00e+00  1.11e+06        1    7.31e-02    4.49e-01

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(5.10.1)-eigensparse-cuda-(12020))

                                     Original                  Reduced
Parameter blocks                        22122                    22122
Parameters                              66462                    66462
Residual blocks                         83718                    83718
Residuals                              167436                   167436

Minimizer                        TRUST_REGION

Dense linear algebra library            EIGEN 
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                     DENSE_SCHUR              DENSE_SCHUR
Threads                                     1                        1
Linear solver ordering               22106,16                 22106,16
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          4.185660e+06
Final                            1.803391e+04
Change                           4.167626e+06

Minimizer iterations                        6
Successful steps                            6
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.028064

  Residual only evaluation           0.034154 (5)
  Jacobian & residual evaluation     0.210194 (6)
  Linear solver                      0.131609 (5)
Minimizer                            0.422234

Postprocessor                        0.002126
Total                                0.452424

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 5.)


real	0m0.589s
user	0m0.527s
sys	0m0.060s

```

gtsam
```
root@nixos:/tinynav/output/gtsam/build# time ./examples/SFMExample_bal /tinynav/problem-16-22106-pr
e.txt 
read 22106 tracks on 16 cameras
Initial error: 4.18566e+06
newError: 2.84007e+06
errorThreshold: 2.84007e+06 > 0
absoluteDecrease: 1345595.21265 >= 1e-05
relativeDecrease: 0.321477369734 >= 1e-05
newError: 428396.143992
errorThreshold: 428396.143992 > 0
absoluteDecrease: 2411669.40624 >= 1e-05
relativeDecrease: 0.84915976888 >= 1e-05
newError: 82756.334431
errorThreshold: 82756.334431 > 0
absoluteDecrease: 345639.809561 >= 1e-05
relativeDecrease: 0.806822877396 >= 1e-05
newError: 28524.7237205
errorThreshold: 28524.7237205 > 0
absoluteDecrease: 54231.6107104 >= 1e-05
relativeDecrease: 0.655316732953 >= 1e-05
newError: 21028.9020562
errorThreshold: 21028.9020562 > 0
absoluteDecrease: 7495.82166434 >= 1e-05
relativeDecrease: 0.262783322207 >= 1e-05
newError: 20482.7928745
errorThreshold: 20482.7928745 > 0
absoluteDecrease: 546.109181693 >= 1e-05
relativeDecrease: 0.0259694576651 >= 1e-05
newError: 20475.2556405
errorThreshold: 20475.2556405 > 0
absoluteDecrease: 7.53723400232 >= 1e-05
relativeDecrease: 0.000367978822444 >= 1e-05
newError: 20475.2532671
errorThreshold: 20475.2532671 > 0
absoluteDecrease: 0.00237339307205 >= 1e-05
relativeDecrease: 1.15915186297e-07 < 1e-05
converged
errorThreshold: 20475.2532671 <? 0
absoluteDecrease: 0.00237339307205 <? 1e-05
relativeDecrease: 1.15915186297e-07 <? 1e-05
iterations: 8 >? 100
final error: 20475.2532671

real	0m3.145s
user	0m13.053s
sys	0m1.102s

```

### Expected behavior:
Since both programs seem to solve the same BAL problem, I expected similar runtime performance, especially since both were built in Release mode.

### perf result

![Image](https://github.com/user-attachments/assets/7a6a5fd4-7546-463d-9cba-ff947dadf6af)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant runtime difference between CERES and GTSAM on same BAL problem #2299

Description:

Dataset:

Version

Build command:

Timing

Expected behavior:

perf result

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant runtime difference between CERES and GTSAM on same BAL problem #2299

Description

Description:

Dataset:

Version

Build command:

Timing

Expected behavior:

perf result

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions