SBR M1: opt-in tail-pin SSM-snapshot eviction (8× warm-resume on deep agentic convs) by tbraun96 · Pull Request #207 · Avarok-Cybersecurity/atlas

tbraun96 · 2026-06-28T01:38:16Z

Problem

On a warm multi-turn hit, KV is reused but the GDN/Mamba SSM layers must replay
their recurrent state from the nearest deeper Marconi checkpoint to the match
point. As the snapshot pool fills, the forecast eviction (last_access·(1+hit_count))
drops a live conversation's deep per-turn checkpoints first — their
hit_count≈0 (they were never the resume anchor), so they look like one-shot
traffic — leaving the next resume to replay from a far-shallow survivor. Replay
distance grows with depth → warm-hit TTFT climbs (measured ~9.5s at ~24k depth;
the 1s→21s pathology).

Fix (M1) — opt-in, OFF by default

SsmSnapshotIndex::evict_lru gains a tail-pin mode: pin the top-K (=8) deepest
snapshots of each resumable session (a conversation resumed at least once),
evicting only non-pinned entries. A resuming deep conversation then finds a
near-tail SSM anchor instead of a far one.

Measured on dgx2 (Qwen3-Next-80B-NVFP4, deep conv idle under one-shot pressure,
then resumed):

	baseline (forecast)	tail-pin ON
warm-resume TTFT (mean)	9.53 s	1.18 s (8×)
best cycle	9.74 s	0.45 s (21×)
replay distance	~7,600 tok	11–984 tok

The 0.45 s warm-cycle latency matches llama.cpp's continuous-sequence resume —
without keeping the sequence live. Exact by construction: only which
checkpoint is restored changes; replay is the unchanged bit-exact WY4 path.

Honest scope — why OFF by default

Enabling tail-pin regresses balanced many-conversation round-robin ~30%
(5.9s→7.7s): there the recency·hit forecast is already near-optimal and pinning
fights it, and the regime cannot be reliably detected from the snapshot index's
local view (four gating variants tried, all failed). So it is OFF by default
— ATLAS_SBR_TAIL_PIN unset is exactly the baseline forecast (do-no-harm
everywhere). Set ATLAS_SBR_TAIL_PIN=1 (optionally ATLAS_SBR_TAIL_PIN_K) for
deep single/few-conversation agentic workloads, where it delivers the 8×.

Research artifacts (`research/sbr/`)

Full forensics, benchmark harnesses, and findings, including the negative
results that bound the approach:

Dense operator-transport is FLOP-negative for GDN; real GDN decay is
heavy-tailed (median horizon ~163 tok, but ~12% of heads > 8192).
M3 2-D (position × layer) sheaf reconciliation prototyped → honest
negative: non-vacuous with an oracle inter-layer map but circular/net-negative
with a fittable one (an accurate inter-layer restriction map is the layer's
forward pass → no shortcut). M1 is the real win, not the sheaf framing.

Tests

cargo test -p spark-runtime radix_tree::tests::snapshot — 20 pass, incl.
test_tail_pin_protects_top_k_deepest and test_tail_pin_off_is_pure_forecast.

Warm multi-turn hits replay GDN/Mamba SSM state from the nearest deeper Marconi checkpoint to the match point; as the snapshot pool fills, the forecast (last_access x (1+hit_count)) evicts a live conversation's deep intermediate checkpoints (hit_count=0 -- they were never the resume anchor) before transient one-shot traffic, stranding the next resume far from its tail and growing TTFT (measured: ~9.5s, the 1s->21s pathology). Fix: two-tier eviction in SsmSnapshotIndex::evict_lru -- evict NON-resumable (one-shot / untracked) snapshots before any entry of a RESUMABLE session (one with any hit_count>0). Provably do-no-harm: when every entry is resumable (balanced multi-conversation round-robin) the non-resumable pool is empty and it degrades identically to the baseline forecast. Exact by construction -- only which checkpoint is restored changes; replay is the unchanged bit-exact WY4 path. Measured (dgx2, Qwen3-Next-80B-NVFP4, deep conv idle under pressure): baseline 9.53s mean / replay ~7600 tok -> SBR 1.18s mean (8.1x), warm cycles 0.45s (21x, replay 11-984 tok), matching llama.cpp continuous-seq resume without keeping the sequence live. Gated by ATLAS_SBR_TAIL_PIN (default on). Earlier "pin top-K deepest" variants regressed the contended multi-conv regime 5.9s->7.7s by displacing live convs' working sets; the session-level two-tier discriminator avoids that. Full forensics + benchmark harnesses in research/sbr/ (PHASE0/1 findings, M1_RESULTS, sbr_*.py).

CPU-only synthetic prototype testing whether a 2-D (position x layer) cellular-sheaf L0 harmonic reconciliation recovers globally-consistent multi-layer GDN SSM state from cheap lossy contractive-window reconstructions with higher fidelity than the lossy input. Verdict: the cross-layer sheaf does genuine, non-vacuous work ONLY with an accurate vertical map (oracle: curvature ~0 -> mean 0.9999, beats no-cross-layer baseline by +0.0048, clears the 0.999 gate). With a fittable ridge inter-layer map (cosine 0.978, plaquette curvature 0.48) it is net-negative vs the cheaper horizontal-only (exact GDN recurrence + exact anchors = M1) baseline at practical tau, and never clears the gate. The win that exists is horizontal+anchors (M1), not sheaf topology. Bottleneck is vertical-map quality, not the sheaf math. Honest negative for the practical lever; do not pursue dgx2 validation absent a high-accuracy inter-layer map.

The two-tier variant was empirically worse (strand cyc0 9.04s) and still regressed balanced multi-conv, so revert to the validated top-K=8 policy: pin the top-K deepest snapshots of each resumable session. Split victim selection into evict_lru_inner(pin, k) for unit-testability. Multi-conversation sweep settled the scope honestly: enabling tail-pin wins the single/few-deep-conversation regime ~8x (9.53s->1.18s, replay 11-984 tok) but regresses balanced many-conversation round-robin ~30% (the recency*hit forecast is already near-optimal there; pinning fights it, and the regime can't be detected from the index's local view). Therefore DEFAULT OFF -- ATLAS_SBR_TAIL_PIN unset is exactly the baseline forecast (provably do-no-harm everywhere); set =1 for deep agentic single-conversation workloads. Exact in both modes. Also: M3 2-D (position x layer) sheaf reconciliation prototyped -> honest negative (research/sbr/M3_FINDINGS.md): non-vacuous with an oracle inter-layer map but circular/net-negative with a fittable one; M1 is the real win.

The background plan that produced M3_FINDINGS.md (honest negative).

+            print(f"[{a.label}] r{rnd} s{s} depth~{depth_tok:6d} TTFT={ttft:.3f}s", flush=True)
+
+    json.dump({"label": a.label, "sessions": a.sessions, "rounds": a.rounds, "rows": rows},
+              open(a.out, "w"), indent=2)


+        text,toks=gen(a.base_url,a.model,msgs,64)
+        res.append({"i":i,"text":text,"toks":toks})
+        print(f"[{a.label}] prompt {i}: {text[:60]!r}",flush=True)
+    json.dump({"label":a.label,"res":res},open(a.out,"w"),indent=2)


+            rows.append({"round":rnd,"conv":s,"depth":depth,"ttft_s":ttft})
+            print(f"[{a.label}] r{rnd} conv{s} depth~{depth:6d} TTFT={ttft:.3f}s",flush=True)
+
+    json.dump({"label":a.label,"convs":a.convs,"rounds":a.rounds,"rows":rows},open(a.out,"w"),indent=2)


+    json.dump({"label":a.label,"res":res},open(a.out,"w"),indent=2)
+
+def compare(pa,pb):
+    A=json.load(open(pa))["res"]; B=json.load(open(pb))["res"]


+    json.dump({"label":a.label,"res":res},open(a.out,"w"),indent=2)
+
+def compare(pa,pb):
+    A=json.load(open(pa))["res"]; B=json.load(open(pb))["res"]


+        rows.append({"phase":"resume","cyc":cyc,"depth":depth,"ttft_s":ttft})
+        print(f"[{a.label}] RESUME cyc{cyc} (after {a.pressure} pressure) depth~{depth:6d} TTFT={ttft:.3f}s",flush=True)
+
+    json.dump({"label":a.label,"rows":rows},open(a.out,"w"),indent=2)


+  python3 sbr_parity.py --label tailpin  --out p_pin.json  --n 10
+  python3 sbr_parity.py --compare p_base.json p_pin.json
+"""
+import argparse, json, math, time, urllib.request


+    slow_heads = []
+    # aggregate cosine per tau across heads
+    agg = {t: [] for t in taus}
+    h_full_norm_slow = []


AzeezIsh added 4 commits June 27, 2026 20:38

research(sbr): add M3 2-D sheaf reconciliation plan

4465c80

The background plan that produced M3_FINDINGS.md (honest negative).

tbraun96 requested a review from AzeezIsh as a code owner June 28, 2026 01:38

github-code-quality Bot found potential problems Jun 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SBR M1: opt-in tail-pin SSM-snapshot eviction (8× warm-resume on deep agentic convs)#207

SBR M1: opt-in tail-pin SSM-snapshot eviction (8× warm-resume on deep agentic convs)#207
tbraun96 wants to merge 4 commits into
mainfrom
feat/sheaf-based-replaying

tbraun96 commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tbraun96 commented Jun 28, 2026

Problem

Fix (M1) — opt-in, OFF by default

Honest scope — why OFF by default

Research artifacts (research/sbr/)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Research artifacts (`research/sbr/`)