Skip to content

Keep RFLUFactorization off the direct dual AD path (fixes #1052)#1057

Merged
ChrisRackauckas merged 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-rflu-dual-regression
Jun 21, 2026
Merged

Keep RFLUFactorization off the direct dual AD path (fixes #1052)#1057
ChrisRackauckas merged 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-rflu-dual-regression

Conversation

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor

Fixes #1052.

What happened

PR #1041 added RFLUFactorization (and PureKLUFactorization) to _use_direct_dual_solve, taken whenever A carries duals. For ForwardDiff over an ODE solve (the issue's Rodas5P case), the Rosenbrock W matrix is the linear-solve A and carries duals, so every Newton solve was routed to factorize W in Dual arithmetic via RecursiveFactorization instead of the split primal/partials path.

RFLUFactorization's Float64 factorization is BLAS/SIMD-grade (cache-blocked, vectorized). Routing the Dual problem through it falls back to generic scalar dual arithmetic and loses that speedup entirely. The split path — which 3.85.1 used — factorizes the primal W once and reuses that factorization across the partial back-solves, and is both correct and far cheaper for A-carrying-duals.

This PR

Drops only RFLUFactorization from _use_direct_dual_solve. GenericLU/SpecializedLU/SpecializedQR (genuinely cheap in dual arithmetic) and PureKLUFactorization are left on the direct path unchanged.

Verification (local, Julia 1.10)

Reproduced the issue's benchmark on the same seeded problem, identical base commit:

RFLU time RFLU alloc derivative ‖d‖
current main (regressed) 2.87 s 199.9 MiB 2.9659001954483846
this PR 0.071 s 15.6 MiB 2.9659001954483837

~40× faster, ~13× less allocation, derivative identical to 1e-15.

Correctness: RFLU (split path) vs GenericLU (direct path) derivative rel diff 2.3e-15; vs default linsolve 0.0.

test/Core/forwarddiff_overloads.jl runs green end-to-end, including the existing RFLU duals-in-A / duals-in-b correctness checks and the new regression test (RFLU stays off the direct dual path (#1052)), which asserts _use_direct_dual_solve(RFLUFactorization()) == false.

Runic --check clean on the changed files.

Note on PureKLU

I measured PureKLUFactorization on the direct path too: the penalty is small (~10–25%, no BLAS-grade fast path to lose), so it is intentionally left unchanged here to keep the PR surgical.


Please ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

PR SciML#1041 added RFLUFactorization to the direct dual solve path, taken
whenever A carries duals. For ForwardDiff over an ODE solve (e.g.
Rodas5P), the Rosenbrock W matrix carries duals, so every Newton solve
factorized W in Dual arithmetic via RecursiveFactorization.

RFLU's Float64 factorization is BLAS/SIMD-grade (cache-blocked,
vectorized); routing the Dual problem through it falls back to generic
scalar dual arithmetic and loses that speedup entirely. Measured on the
issue's case: ~40x slower (0.07s -> 2.9s) and ~13x more allocation
(15.6 MiB -> 199.9 MiB), with identical derivatives.

The split primal/partials path already handles duals-in-A correctly: it
factorizes the primal W once and reuses that factorization across the
partial back-solves, which is what 3.85.1 did. Drop RFLU from
_use_direct_dual_solve so it stays on that path. GenericLU/SpecializedLU/
SpecializedQR (cheap in dual arithmetic) and PureKLU are unaffected.

Adds a regression test asserting RFLU is not routed to the direct path.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ChrisRackauckas ChrisRackauckas marked this pull request as ready for review June 21, 2026 10:44
@ChrisRackauckas ChrisRackauckas merged commit f632098 into SciML:main Jun 21, 2026
48 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Major performance regression after opting RFLUFactorization out of split dual AD path

2 participants