Migrate RSV to volume-invariant sampling and harden skiplist economics#498
Conversation
The 14.8.4 throughput uplift exposed a dilution gap in the previous sample-budget mechanism: with 20 verifications budgeted per hotkey per tempo and our validator now driving ~7000 dispatches per miner per tempo, the realised per-validator sample rate collapsed from the design target (4%) to ~0.3%. Stake distribution analysis on subnet 2 shows top-2 validators hold 80% of stake (effective N via 1/HHI = 3.0), so our individual detection rate dominates aggregate consensus security rather than averaging it out. Replace the budget gate with a flat probabilistic roll (VERIFICATION_SAMPLES_PER_TEMPO / RSV_EXPECTED_SUBS_PER_TEMPO = 4%), restoring the design-intent sample rate at any throughput. Drop the sample_budget field and its persistence ratchets. Tighten strike threshold to 1 (closes the rate-of-attack loophole where a miner trickling cheats below 3-per-60-min escapes accumulation) and extend skiplist to 20 tempos (~24h zero-weight contribution from us, sized against the ~0.02 TAO/day top-miner emission at our stake share). Separately introduce --dispatch-ceiling Option<usize> so the validator no longer ties in-flight QUIC fan-out to verification CPU. Default None (uncapped) — adaptive per-miner caps and the pending_verifications buffer remain the real backpressure. The verification_concurrency * 8 formula was an artifact of the pre-RSV era when every proof entered the verifier and CPU saturation was the dominant constraint.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughVerification constants changed; RsvManager removes per-(hotkey,tempo) sampling budgets and uses volume-invariant random sampling; a new optional ChangesVerification and Dispatch Tuning
Possibly related PRs
Suggested labels
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/sn2-validator/src/cli.rs (1)
57-58: ⚡ Quick winAdd help text for the new CLI flag.
The
--dispatch-ceilingflag lacks documentation. Users won't understand its purpose, valid values, or when to set it without examining the code or external documentation.📝 Suggested help text
- #[arg(long)] + #[arg(long, help = "Maximum concurrent requests in flight (tasks + verifications + pending). Defaults to unbounded; backpressure from pending_verifications cap and per-miner limits still applies")] pub dispatch_ceiling: Option<usize>,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sn2-validator/src/cli.rs` around lines 57 - 58, Add user-facing help text for the CLI flag by annotating the dispatch_ceiling field (the struct field named dispatch_ceiling currently annotated with #[arg(long)]) with a descriptive help string explaining its purpose, acceptable values (e.g., positive integer or "none"), and when to use it; update the attribute to include help = "..." (or add a doc comment above the field) so the --dispatch-ceiling flag shows this guidance in the CLI help output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/sn2-validator/src/cli.rs`:
- Around line 57-58: Add user-facing help text for the CLI flag by annotating
the dispatch_ceiling field (the struct field named dispatch_ceiling currently
annotated with #[arg(long)]) with a descriptive help string explaining its
purpose, acceptable values (e.g., positive integer or "none"), and when to use
it; update the attribute to include help = "..." (or add a doc comment above the
field) so the --dispatch-ceiling flag shows this guidance in the CLI help
output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9c524874-065b-4c6b-8741-86e226bdcfb8
📒 Files selected for processing (5)
crates/sn2-types/src/constants.rscrates/sn2-validator/src/cli.rscrates/sn2-validator/src/config.rscrates/sn2-validator/src/rsv.rscrates/sn2-validator/src/validator_loop/dispatch.rs
…flag Two rsv tests asserted behavior that is unreachable at the new STRIKES_REQUIRED=1 setting: record_strike_below_threshold_no_skiplist checked that the first strike was a no-op, and strike_aging_removes_old_strikes required accumulating multiple strikes for the in-window pruning loop to be observable. Both behaviors only manifest at threshold >= 2; deleting them rather than reshaping since the underlying aging code path remains intact for any future threshold tuning. Annotate --dispatch-ceiling with a help string covering default behavior (uncapped, governed by adaptive caps + pending buffer) and when an operator would set a hard cap.
Why
The 14.8.4 throughput uplift exposed a sampling gap. RSV used a per-(hotkey, tempo) budget of 20 verifications — designed against ~500 expected submissions per miner per tempo. At our current throughput on this validator (~7,000 dispatches per miner per tempo), the realised sample rate collapsed from the design target of 4% to ~0.3%.
Stake-weighted consensus magnifies this. On subnet 2:
So aggregate-detection security is dominated by a handful of high-stake validators rather than the count of all validators. Our per-validator detection rate is the load-bearing factor.
Changes
should_samplewith a flatrng.range(0..RSV_EXPECTED_SUBS_PER_TEMPO) < VERIFICATION_SAMPLES_PER_TEMPOroll. Restores 4% sample rate at any throughput. Drop thesample_budgetHashMap and its retention sweep.VERIFICATION_STRIKES_REQUIRED: 3 → 1. Closes the rate-of-attack loophole where a miner trickling cheats <3 per 60-min window never accumulates the threshold. zk verification is deterministic; legitimate false positives are ~zero, so single-strike is the cleaner choice.VERIFICATION_SKIPLIST_TEMPOS: 5 → 20(~24h zero-weight from us). At 35.7% stake × 24h × ~0.02 TAO/day top-miner emission, per-cheat E[penalty] is ~3× E[gain] at restored 4% sample rate. EV-negative.--dispatch-ceiling Option<usize>, defaultNone(uncapped). Wasverification_concurrency * 8— an artifact of the pre-RSV era when every proof entered the verifier. With RSV skipping ~96% of verifications, dispatch CPU and verify CPU decouple. Adaptive per-miner caps (sum currently ~6,400 across the metagraph) andpending_verificationscap remain the real backpressure.Verification
cargo check --workspace,cargo clippy --workspace --tests -- -D warnings,cargo fmt --check,cargo test --workspace --liball clean. New unit test (should_sample_rate_is_volume_invariant) confirms sample rate stays within 0.5% of the 4% target over 100k trials.Rollout
Ship as 14.8.5. Operators on default config see:
Operators wanting to keep the conservative cap can pass
--dispatch-ceiling=N.Summary by CodeRabbit
New Features
Chores