Skip to content

feat(chain)!: taint-aware CanonicalView::balance via canonical ancestor walk#2235

Draft
evanlinjin wants to merge 6 commits into
bitcoindevkit:masterfrom
evanlinjin:feat/canonical-ancestors
Draft

feat(chain)!: taint-aware CanonicalView::balance via canonical ancestor walk#2235
evanlinjin wants to merge 6 commits into
bitcoindevkit:masterfrom
evanlinjin:feat/canonical-ancestors

Conversation

@evanlinjin

@evanlinjin evanlinjin commented Jun 23, 2026

Copy link
Copy Markdown
Member

Warning

This PR — including this description — is currently entirely AI-generated.
It still needs proper human review (by me, @evanlinjin) before it should be taken seriously. Treat everything below as a draft proposal, not a vetted design.

Description

This is an alternative direction to #2221. Where #2221 makes CanonicalView::balance less restrictive by removing the min_confirmations parameter, this PR instead tries to make it more useful — handing control to the caller via closures, deriving trust from ancestry rather than a per-script heuristic, and exposing a per-output spend-eligibility classifier that's the building block for coin control.

The core new primitive is CanonicalView::classify_outpoints, built on a reusable, sans-TxGraph canonical ancestor walk. balance becomes a thin fold over it.

What's here (commit by commit)

1. Add CanonicalAncestors reverse-topological walk. Canonical::ancestors(seeds, map, should_walk) walks the canonical ancestors of a set of seed txids backwards (the seeds are the leaf-most txs, closer to the leaves of the ancestry DAG than its roots). Each tx is visited once in reverse-topological order, folding a Mergeable accumulator from descendants into ancestors (diamonds merge correctly). should_walk prunes per-tx with the ChainPosition available; the iterator is ExactSizeIterator.

2. Add Canonical::ancestors_inclusive. A variant that also yields the seeds (each with its own final accumulator), so a per-seed contribution can be folded uniformly with the ancestry.

3. Taint-aware CanonicalView::balance (breaking). Replaces trust_predicate + min_confirmations with two closures:

  • does_taint(CanonicalTx) -> bool: a pending output is untrusted if it, or any of its unsettled ancestors, taints. The unsettled ancestry is walked once (deduped across outputs, stopping at settled txs) via ancestors_inclusive. This lets callers demote unconfirmed coins received from — or chained on top of — a third party. Trust is now derived from ancestry instead of a per-script heuristic.
  • is_settled(&ChainPosition) -> bool: generalizes the numeric min_confirmations into a caller-defined "confident this won't be replaced" boundary, and is the sole authority on it — a settled output is never dropped or tainted (even an unconfirmed output a caller chooses to treat as settled is counted, not lost).
  • balance also drops its O identifier generic and now takes bare OutPoints.

4. Rename Balance::confirmed to Balance::settled (breaking). The bucket is driven by is_settled (confidence a tx won't be replaced), not strictly confirmation status, so the field is renamed to match. Breaking serde key change (confirmedsettled, no alias).

5. Add classify_outpoints spend-eligibility classifier. CanonicalView::classify_outpoints pairs each unspent output with a chain-level Eligibility (Settled / Immature / TrustedPending / UntrustedPending) from is_settled + the does_taint walk. This is the primitive for coin selection / coin control (e.g. "prefer settled coins, fall back to trusted-pending"): the caller decides how to aggregate and can layer wallet-specific categories like "locked" on top, instead of being constrained to the fixed Balance buckets. balance is re-expressed as a thin fold over it.

Direction / open questions for human review

  • The longer-term idea is that classify_outpoints is the real API and Balance is just one possible fold — wallets that want categories like "locked"/"reserved" aggregate themselves. This PR keeps Balance (as a fold) for now; a follow-up could move it down into bdk_wallet and drop it from bdk_chain.
  • Is replacing trust_predicate with ancestry-based does_taint the right call, or should both coexist? (does_taint can't express per-output trust, but per-tx ancestry is arguably the more principled model.)
  • Balance::confirmedBalance::settled: worth the rename / serde break, or keep the conventional name?
  • bdk_wallet will need its balance call updated (out of this repo's tree).

Notes to reviewers

  • Tests: ancestor-walk tests (diamond merge / reverse-topo order / pruning / exact size), a balance + classify_outpoints ancestry-taint test, and a regression test that a settled-everything is_settled never drops an unconfirmed output. Existing balance tests were migrated (the old per-keychain trust test is recomputed for the new model, since none of its txs actually spend third-party coins).
  • cargo fmt, clippy --all-features --all-targets -p bdk_chain, and RUSTDOCFLAGS="-D warnings" cargo doc are clean; the bdk_chain suite + doctests pass.

Changelog notice

Changed

  • CanonicalView::balance now takes does_taint and is_settled closures instead of trust_predicate and min_confirmations, and accepts bare OutPoints.
  • Renamed Balance::confirmed to Balance::settled.

Added

  • CanonicalView::classify_outpoints + Eligibility: per-output spend-eligibility classification (the building block for coin control).
  • Canonical::ancestors / Canonical::ancestors_inclusive: reverse-topological canonical ancestor walk with a Mergeable accumulator.

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.52336% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.86%. Comparing base (6d03fc3) to head (71c35d9).

Files with missing lines Patch % Lines
crates/chain/src/canonical.rs 87.65% 10 Missing ⚠️
crates/chain/src/balance.rs 0.00% 3 Missing ⚠️
crates/chain/src/canonical_ancestors.rs 97.69% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2235      +/-   ##
==========================================
+ Coverage   78.65%   78.86%   +0.20%     
==========================================
  Files          30       31       +1     
  Lines        5909     6076     +167     
  Branches      279      285       +6     
==========================================
+ Hits         4648     4792     +144     
- Misses       1185     1209      +24     
+ Partials       76       75       -1     
Flag Coverage Δ
rust 78.86% <92.52%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@evanlinjin evanlinjin force-pushed the feat/canonical-ancestors branch 2 times, most recently from 1fb7787 to 0a9fa74 Compare June 24, 2026 03:11
evanlinjin and others added 4 commits June 24, 2026 03:27
Add `Canonical::ancestors`, returning a `CanonicalAncestors` iterator that
walks the canonical ancestors of a set of seed transactions.

- Ancestors are yielded in reverse topological order (descendants before
  the ancestors they spend) and each transaction is visited exactly once.
- Accumulation is a fold over the ancestor DAG: `map` computes each tx's own
  contribution from its `CanonicalTx`, and a tx's final accumulator is its
  contribution `Merge`d with the accumulators of every in-set descendant.
- `should_walk` prunes the walk per-tx, with the chain position available so
  callers can decide cutoffs (e.g. stop at confirmed).

Seeds are given as txids (looked up in the canonical set) and seed the
accumulation without being yielded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a variant of `Canonical::ancestors` that also yields the seed
transactions (each with its own final accumulator), not just their ancestors.
Pruning, deduplication and the reverse-topological order are unchanged; the
seeds are emitted alongside the walked ancestors.

Useful when the per-seed contribution must be folded uniformly with the
ancestry (e.g. a seed that taints itself), avoiding a separate pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace `balance`'s `trust_predicate` and `min_confirmations` parameters with
two closures:

* `does_taint(CanonicalTx) -> bool` decides whether a transaction taints its
  descendants. A pending output is `untrusted_pending` if it, or any of its
  unsettled ancestors, taints; otherwise `trusted_pending`. The unsettled
  ancestry of each pending output is walked via `ancestors_inclusive` (deduped
  across outputs, stopping at settled transactions). This lets callers demote
  unconfirmed coins received from (or chained on top of) a third party.
* `is_settled(&ChainPosition) -> bool` decides the confirmed/pending boundary,
  generalizing the old numeric `min_confirmations` (e.g. it can treat
  shallowly-confirmed outputs as pending and taintable).

`does_taint` replaces the per-spk `trust_predicate`: trust is now derived from
ancestry rather than a per-script heuristic. Call sites and tests are updated;
the old per-keychain trust test is recomputed for the new model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The balance "confirmed" bucket is now driven by the caller-supplied
`is_settled` predicate (transactions we are confident will not be replaced),
not strictly by confirmation status. Rename the field to match the concept and
update `Display`, `Add`, `total`, `trusted_spendable` and all call sites.

This is a breaking change: the serde key changes from "confirmed" to "settled"
(no alias).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add `CanonicalView::classify_outpoints`, which pairs each unspent output with a
chain-level `Eligibility` (`Settled` / `Immature` / `TrustedPending` /
`UntrustedPending`) computed from `is_settled` and the `does_taint` ancestry
walk. This is the primitive for coin selection / coin control: the caller
decides how to aggregate, and can layer wallet-specific categories (e.g.
"locked") on top — rather than being constrained to the fixed `Balance` buckets.

`CanonicalView::balance` is re-expressed as a thin fold over
`classify_outpoints`, adding each output's value to the `Balance` bucket that
matches its `Eligibility`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@evanlinjin evanlinjin force-pushed the feat/canonical-ancestors branch from a286ab4 to ab0c5a2 Compare June 24, 2026 05:01
… walk

Close review-identified coverage gaps:

- `test_balance_taint_stops_at_settled_ancestor`: a settled ancestor that
  `does_taint` would flag must not taint its unsettled descendant — isolates the
  `!is_settled` guard in the taint walk.
- `test_classify_immature_and_settled`: `classify_outpoints` returns `Immature`
  for an immature coinbase and `Settled` for a mature output.
- `ancestors_inclusive_yields_seeds`: the inclusive walk yields the seeds (with
  their own accumulators) and `len()` accounts for them.
- `ancestors_multiple_seeds_dedup_shared_ancestor`: an ancestor shared by two
  seeds is visited once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@evanlinjin evanlinjin force-pushed the feat/canonical-ancestors branch from c9c6f24 to 71c35d9 Compare June 25, 2026 11:37
@evanlinjin

Copy link
Copy Markdown
Member Author

@oleonardolima It will be ideal if this can reuse logic from #2219.

@Dmenec Dmenec left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @evanlinjin thank you for pushing this.

As there are many things involved, I tried to go point-by-point on some thoughts I have in here.

Related to your questions:

The longer-term idea is that classify_outpoints is the real API and Balance is just one possible fold — wallets that want categories like "locked"/"reserved" aggregate themselves. This PR keeps Balance (as a fold) for now; a follow-up could move it down into bdk_wallet and drop it from bdk_chain.

Agree, classify_outpoints + Eligibility is a good primitive, Balance is just a convenience fold on top so it makes sense to me. I think moving Balance out of bdk_chain would be a breaking change for consumers calling it directly, so maybe the migration needs care.

Is replacing trust_predicate with ancestry-based does_taint the right call, or should both coexist? (does_taint can't express per-output trust, but per-tx ancestry is arguably the more principled model.)

As said in one comment below, trust_predicate had this identifier that provided specific assumptions per-output. I don't think that's a problem though, that policy was always wallet-specific. It makes more sense for bdk_wallet to handle it externally.

bdk_wallet will need its balance call updated (out of this repo's tree).

This is the part I'm least sure about, so I'll just lay out what I found and leave it open.

While looking into the walk I tried a different approach: a per-UTXO memoized DFS (one bool cached per tx, shared across UTXOs so each ancestor is visited once) instead of the topological accumulator pass. Same does_taint/is_settled contract, and I checked it gives the same Eligibility as classify_outpoints on a diamond. Benchmarked both across a few topologies:

topology size memoized classify_outpoints speedup
fan-in 10w×3d 9.0 µs 21.9 µs 2.4x
fan-in 50w×5d 77.7 µs 192.9 µs 2.5x
fan-in 100w×10d 300 µs 878 µs 2.9x
fan-in 500w×20d 3.3 ms 14.5 ms 4.4x
disjoint 10w×3d 9.9 µs 24.4 µs 2.5x
disjoint 50w×5d 83.9 µs 194.6 µs 2.3x
disjoint 100w×10d 311 µs 842 µs 2.7x
disjoint 500w×20d 3.3 ms 14.0 ms 4.2x
untrusted-fan-in 10w×3d 8.1 µs 24.2 µs 3.0x
untrusted-fan-in 50w×5d 73.2 µs 209.7 µs 2.9x
untrusted-fan-in 100w×10d 295 µs 940 µs 3.2x
untrusted-fan-in 500w×20d 3.2 ms 16.3 ms 5.1x
diamond 5s×20u 13.5 µs 39.2 µs 2.9x
diamond 10s×50u 49.2 µs 169.1 µs 3.4x
diamond 20s×100u 158 µs 627 µs 4.0x
diamond 50s×500u 1.8 ms 8.3 ms 4.6x

(w = number of parallel chains, d = depth of each chain; for diamond s = shared ancestor txs, u = UTXOs spending all of them.)

Both walk the same O(U+S) txs but the BTreeSet accumulator grows with the number of downstream UTXOs, while the memoized walk just stores one bool per tx and stops early once does_taint fires.

What I genuinely can't decide is where this belongs. It doesn't use CanonicalAncestors at all, just the public tx/txout, so it feels a bit out of place sitting next to classify_outpoints in bdk_chain, not really taking advantage of the ancestor-walk machinery this PR adds. It could just as easily live in bdk_wallet. But then direct bdk_chain consumers of balance() wouldn't benefit.

I'm not sure if it makes sense here, wallet-side, or not at all, some thoughts?

Code + bench: https://github.com/Dmenec/bdk/tree/bench/trust-classification

Comment on lines -13 to +15
/// Confirmed and immediately spendable balance
pub confirmed: Amount,
/// Settled balance: outputs whose transaction we are confident will not be replaced (e.g.
/// confirmed deeply enough), as determined by the balance query's `is_settled` predicate.
pub settled: Amount,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, settled better reflects that the threshold is a wallet policy decision.

/// Pending (not settled), and neither it nor any of its unsettled ancestors taints it.
TrustedPending,
/// Pending (not settled), and it or one of its unsettled ancestors taints it.
UntrustedPending,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the taint concept more than saying some previous transaction is untrusted, what's actually untrusted is the pending UTXO, not the ancestor.

pub fn classify_outpoints(
&self,
outpoints: impl IntoIterator<Item = OutPoint>,
mut does_taint: impl FnMut(CanonicalTx<ChainPosition<A>>) -> bool,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trust_predicate let callers express per-output trust by keychain. does_taint only sees the ancestor transaction, which does not allow for a "blind trust assumption" (i.e. this output belongs to my keychain so trust it).

Not sure if this policy now has to be layered on top of the Eligibility results.

Comment on lines +516 to +520
for (descendants, c_tx) in self.ancestors_inclusive::<BTreeSet<Txid>, _, _>(
seeds.iter().copied(),
|c_tx| core::iter::once(c_tx.txid).collect(),
|c_tx| !is_settled(&c_tx.pos),
) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does_taint is called eagerly for every unsettled ancestor here and before the returned iterator is built. So if a caller provides an expensive closure, we'd have to evaluate it against every walked ancestor upfront, even if we only end up needing a small part of the UTXOs.

Comment on lines +29 to +37
/// Builds a diamond of confirmed transactions and returns the chain, graph and the four txs.
///
/// ```text
/// tx_a (2 outputs)
/// / \
/// tx_b tx_c
/// \ /
/// tx_d (root we walk ancestors from)
/// ```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe checking with different topolgies would be useful to cover.

For instance: Disjoint seeds with no shared ancestry or a seed that is itself an ancestor of another seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants