Introduction
This document contains the release notes for the automatic differentiation plugin for clang Clad, release 2.1. Clad is built on top of
Clang and LLVM compiler infrastructure. Here we describe the status of Clad in some detail, including major improvements from the previous release and new feature work.
Note that if you are reading this file from a git checkout, this document applies to the next release, not the current one.
What's New in Clad 2.1?
Clad 2.1 introduces major advancements in reverse mode differentiation, bringing smarter handling of loops, assignments, and method calls, alongside the new clad::restore_tracker for functions that modify their inputs. Forward mode gains static scheduling for Hessians and higher-order derivatives, while CUDA support expands with custom derivatives for key Thrust algorithms such as reduce, transform, and transform_reduce, plus optimizations that reduce unnecessary GPU atomics. The release also strengthens error estimation, simplifies adjoint initialization, improves tape efficiency, and enhances diagnostics. With a migration to C++17, support extended up to clang-21, and numerous bug fixes, Clad 2.1 delivers faster, safer, and more reliable automatic differentiation across CPU and GPU workflows.
Some of the major new features and improvements to Clad are listed here. Generic improvements to Clad as a whole or to its underlying infrastructure are described first.
External Dependencies
- Clad now works with clang-11 to clang-21.
- Switch to C++17 standard.
Forward Mode & Reverse Mode
- Improve support for differentiation of function/method calls:
- Move call/argument differentiation into unified helpers.
- Improve handling of base initializers, delegating constructors, and method bases.
- Optimizations in Tape-Based Recording (TBR):
- Enable TBR for pointer arithmetic, nested calls, local variables, loops, and constructors.
- Improve analysis of pointers, nested derivatives, and division denominators.
- Remove redundant or unsafe
getZeroInitinitializations in aggregates and init lists.
- Statically schedule higher-order derivatives for Hessians.
Reverse Mode
- Better handling of pullbacks and reverse_forw:
- Avoid generating empty or redundant pullbacks.
- Improve type consistency in reverse_forw.
- Introduce
clad::restore_trackerto support functions modifying their arguments.
- Improve differentiation ordering (differentiate method bases before arguments).
- Simplify assignments when LHS and RHS are independent.
- Support for for-loops without increments.
- Add optimizations to avoid unnecessary storage of unused return values.
- Consider unions differentiable.
- Support for not differentiating w.r.t. const references.
- Numerous fixes in
reverse_forwhandling of types and aggregates.
CUDA
- Add custom derivatives for several Thrust algorithms:
thrust::reduce,thrust::transform,thrust::transform_reduce,
thrust::copy,thrust::inner_product.
- Improve handling of CUDA atomics:
- Avoid atomics for injective index computations.
- Added liveness analysis for removing unnecessary atomics.
Error Estimation
- Reverse mode error estimation now uses pullbacks consistently.
- Fix
_final_errorvalue propagation .
Misc
-
Improve diagnostics for propagator signature mismatches.
-
Full qualification logic for generated code.
-
Better handling of consteval functions, decl refs, and unused parameters.
-
Remove unused STL derivatives and redundant infrastructure.
-
CI improvements:
- Added valgrind test runs.
- Updated MacOS runners, godbolt build (clang 21), and codecov handling.
-
Documentation: Update README to clarify plugin usage, C++14 requirement, and
plugin args.
Fixed Bugs
428 691 752 768 1274 1301 1346 1349 1411 1419 1436 1445 1457 1466 1469 1472 1473 1490 1497 1498 1555 1557 1559 1560 1565 1573 1581 1582
Special Kudos
This release wouldn't have happened without the efforts of our contributors,
listed in the form of Firstname Lastname (#contributions):
FirstName LastName (#commits)
A B (N)
PetroZarytskyi (60; major ad/tbr work)
Vassil Vassilev (30; infrastructure, ci, compiler integration)
Abdelrhman Elrawy (7; cuda/thrust support)
Max Andriychuk (7; va/tbr infrastructure, cfg/loop analysis, cuda atomics)
aditimjoshi (4; tape improvements, benchmarks)
Vipul Cariappa (1; godbolt integration)
Timo Nicolai (1; docs, usage improvements)
Rohan Timmaraju (1; tensor-specific support)
mcbarton (1; ci, runners)
Jonas Rembser (1; build fixes)
Christina Koutsou (1; nondiff structs support)
What's Changed
- [ci] Do not use the latest and potentially unstable package channel by @vgvassilev in #1467
- [ci] Bump the codecov action to address random failures. by @vgvassilev in #1471
- Nondiff structs and Don't create adjoints for nondiff params and call args by @kchristin22 in #1447
- Support std::pair constructors natively by @PetroZarytskyi in #1440
- Check if m_AnalysisDC is valid by @ovdiiuv in #1459
- [clang-tidy] Bump the version of clang-tidy to 20. by @vgvassilev in #1475
- Don't create dynamic arrays for static arrays declarations inside loops by @PetroZarytskyi in #1468
- [clang-tidy] Do not exclude benchmarks and demos. by @vgvassilev in #1476
- Don't create CUDA atomics for basic indices by @ovdiiuv in #1441
- Don't store unused return values of reverse_forw by @PetroZarytskyi in #1477
- Don't unnecessarily move loop condition to the loop body by @PetroZarytskyi in #1478
- Add custom derivative for thrust::reduce by @a-elrawy in #1472
- Account for implicit exprs in hasUnusedReturnValue by @PetroZarytskyi in #1479
- Added support for thrust::inner_product by @a-elrawy in #1480
- Enable TBR in the tests of the produced code by @PetroZarytskyi in #1481
- Analyze local DeclStmt in TBR by @PetroZarytskyi in #1484
- Refactor updateCall and friends. NFC by @vgvassilev in #1487
- Move TBR infrastructure to AnalysisBase by @ovdiiuv in #1456
- Update plugin usage information by @Time0o in #1489
- Consider unions differentiable. by @PetroZarytskyi in #1493
- [clang-tidy] Disable any-all-of suggestions by @vgvassilev in #1495
- Add custom derivatives for reduce overloads by @a-elrawy in #1492
- Improve support for pointers in TBR by @PetroZarytskyi in #1494
- Add initial support for nested calls in TBR by @PetroZarytskyi in #1490
- Use clearer implementation, do not rely on finding the old DeclRefExpr. NFC by @vgvassilev in #1499
- [codecov] Bump up the allowed thresholds of coverage. by @vgvassilev in #1500
- Move the args variables next to its use. NFC by @vgvassilev in #1501
- Remove unused interfaces. NFCI by @vgvassilev in #1503
- Drop lookuping up custom reverse_forw derivatives in favor of static scheduling by @vgvassilev in #1502
- Only use the reverse mode in error estimation by @PetroZarytskyi in #1504
- Rework setting the final error. by @vgvassilev in #1505
- Improve consistency using high-level interfaces. by @vgvassilev in #1507
- Refactor the logic discovering the correct call operator. NFC by @vgvassilev in #1509
- Refactor nested TBR by @PetroZarytskyi in #1510
- Add custom derivative for thrust::copy by @a-elrawy in #1511
- Simplify assignments with independent RHS/LHS by @PetroZarytskyi in #1512
- Remove unsupported compilers. by @vgvassilev in #1514
- Enable TBR in pointer arithmetics by @PetroZarytskyi in #1516
- Find used parameter values in nested TBR by @PetroZarytskyi in #1515
- Support not differentiating w.r.t. const references in reverse mode by @PetroZarytskyi in #1513
- Remove exceptions in shouldBeRecorded by @PetroZarytskyi in #1517
- Respect the selected by the user overload. by @vgvassilev in #1488
- Rework finding of the target function to differentiate. by @vgvassilev in #1518
- Schedule first order derivatives for hessians statically by @PetroZarytskyi in #1523
- Add support for thrust::transform by @a-elrawy in #1520
- Remove redundant calls to PerformPendingInstantiations. by @vgvassilev in #1525
- Adjust failing test, still failing on i586. by @vgvassilev in #1519
- Enable static custom derivatives lookups for topmost diffrequests by @PetroZarytskyi in #1524
- Reimplement VA using AnalysisBase infrastructure by @ovdiiuv in #1508
- Tensor-specific Clad changes by @Rohan-T144 in #1462
- Use Stmt* as identifiers in TBR instead of SourceLocation by @PetroZarytskyi in #1527
- Remove ID str from range-based loops iterators by @ovdiiuv in #1528
- Ask valgrind to not output anything if tests are not broken. by @vgvassilev in #1529
- Remove Iterations(1) from tape benchmarks by @aditimjoshi in #1533
- Add support for thrust::transform_reduce by @a-elrawy in #1532
- Initialize deallocation count and display per iteration counters. by @vgvassilev in #1534
- Change m_DelayedCalls to std::deque by @PetroZarytskyi in #1535
- [ci] Add a build that runs clad's testsuite with valgrind. by @vgvassilev in #1537
- Add support for functions that modify their parameters by @PetroZarytskyi in #1445
- Move argument differentiation into a separate function by @PetroZarytskyi in #1541
- Add a test for #1349 to avoid regression by @PetroZarytskyi in #1544
- [clang-tidy] Enforce CamelCase for class members by @vgvassilev in #1543
- Fix range-based loops and add early traverse for regular loops in VA by @ovdiiuv in #1536
- Add a test for #1346 to avoid regression. by @PetroZarytskyi in #1542
- Do not generate empty pullbacks by @PetroZarytskyi in #1545
- Avoid early return in RMV::VisitCallExpr by @PetroZarytskyi in #1546
- Add support for LLVM 21 by @vgvassilev in #1530
- Remove dead code. NFC by @vgvassilev in #1547
- update godbolt build for clang 21 by @Vipul-Cariappa in #1549
- Move MacOS x86 runners to MacOS 15 by @mcbarton in #1552
- Do not add a pullback parameter for functions with reference return types by @PetroZarytskyi in #1550
- Add tail pointer and capacity variable to tape by @aditimjoshi in #1531
- Account for division with unstored denominator in TBR by @PetroZarytskyi in #1556
- Refactor pointer/reference variable differentiation to reduce code repetition by @PetroZarytskyi in #1558
- Consider user-provided derivatives with no bodies non-differentiable by @PetroZarytskyi in #1564
- Disable analyses for dynamically scheduled derivatives by @PetroZarytskyi in #1561
- Don't copy-initialize reverse_forw diffrequests with pullback diffrequests by @PetroZarytskyi in #1566
- Build adjoint init lists during differentiation instead of getZeroInit by @PetroZarytskyi in #1563
- Don't generate pullbacks of static methods with non-diff args. by @PetroZarytskyi in #1572
- Improve the diagnostics for propagator signature mismatch. by @vgvassilev in #1568
- Add missing includes in case
NDEBUGis not defined by @vgvassilev in #1574 - Support for-loops without increments in the reverse mode. by @PetroZarytskyi in #1570
- Move derivative call building to a separate function by @PetroZarytskyi in #1567
- Switch to c++17. by @vgvassilev in #1576
- Don't change variable types in reverse_forw. by @PetroZarytskyi in #1579
- Don't initialize aggregates with getZeroInit in the reverse mode by @PetroZarytskyi in #1578
- Differentiate method base before its arguments by @PetroZarytskyi in #1580
- Always initialize adjoints in reverse_forw by @PetroZarytskyi in #1584
- Add custom reverse_forw for thrust::reduce by @a-elrawy in #1585
New Contributors
Full Changelog: v2.0...v2.1