Investigate validator throughput ceiling and reshape dispatch hot path#494
Conversation
Profiling on mainnet showed one main-loop thread saturated at 100% while fifteen tokio workers idled, a thirty-thousand-entry dslice queue materialised as serde_json::Value trees with by-value Circuit clones, and in-flight task fanout capped at thirty-two against ~240 queryable miners. Replace queue payloads with Arc<Circuit> and msgpack bytes so each entry collapses from kilobytes of nested Value enums to a single contiguous allocation. Introduce a DispatchCache that memoises miner capacities, adaptive timeout, and the api-eligible set with a two-second TTL so the per-call flat_map+sort and HashMap clones no longer pin a core. Replace the per-dispatch full NeuronInfo clone with a Vec<u16> of UIDs plus just-in-time lookup through the existing uid_to_idx index. Raise the dispatch ceiling from 2x to 8x verification_concurrency and decouple the pending_verifications cap from the verify_tasks cap so I/O fanout is no longer gated by CPU-bound proof verification.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR migrates tensor inputs from JSON to MessagePack bytes, adds MessagePack tensor codec helpers and Circuit msgpack validation, updates request types to use ChangesMessagePack codec and DSlice byte-oriented pipeline
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/sn2-validator/src/validator_loop/dispatch.rs (1)
265-266: 🏗️ Heavy liftAvoid cloning the full
Circuitback into each dispatched dslice.
Arc<Circuit>shrinks the queue, butSome((*dslice.circuit).clone())reintroduces one full circuit allocation per in-flight request. With the higher dispatch ceiling, that can eat into the memory win from this PR. If the verification path can accept shared ownership, carryArc<Circuit>throughDispatchedRequest/MinerResponseinstead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sn2-validator/src/validator_loop/dispatch.rs` around lines 265 - 266, The code is cloning the full Circuit into each dispatched dslice via Some((*dslice.circuit).clone()), undoing the Arc memory benefit; change the field types for task_circuit in DispatchedRequest and MinerResponse to Option<Arc<Circuit>> and stop cloning the inner Circuit—set task_circuit to Some(dslice.circuit.clone()) (cloning the Arc, not the Circuit) and update all downstream consumers (verification path) to accept Arc<Circuit> shared ownership instead of owned Circuit so no full Circuit allocations are reintroduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sn2-validator/src/validator_loop/dispatch.rs`:
- Around line 226-230: The call to sn2_types::decode_msgpack_to_json currently
uses unwrap_or_default on dslice.inputs which hides decoding errors; replace
that with explicit error handling: call decode_msgpack_to_json(&dslice.inputs)
and match the Result, and on Err log or propagate the decoding error (including
the error details and uid/dslice id) and drop/fail the request instead of
constructing dslice_model and calling self.pipeline.prepare_dslice_request; only
continue to call prepare_dslice_request on Ok(inputs_json). Ensure references in
the fix are to decode_msgpack_to_json, dslice.inputs, prepare_dslice_request and
remove the unwrap_or_default usage.
---
Nitpick comments:
In `@crates/sn2-validator/src/validator_loop/dispatch.rs`:
- Around line 265-266: The code is cloning the full Circuit into each dispatched
dslice via Some((*dslice.circuit).clone()), undoing the Arc memory benefit;
change the field types for task_circuit in DispatchedRequest and MinerResponse
to Option<Arc<Circuit>> and stop cloning the inner Circuit—set task_circuit to
Some(dslice.circuit.clone()) (cloning the Arc, not the Circuit) and update all
downstream consumers (verification path) to accept Arc<Circuit> shared ownership
instead of owned Circuit so no full Circuit allocations are reintroduced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e08dd6e5-d208-44d2-b16b-68347da1eae3
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
Cargo.tomlcrates/sn2-types/Cargo.tomlcrates/sn2-types/src/circuit.rscrates/sn2-types/src/lib.rscrates/sn2-types/src/request.rscrates/sn2-types/src/tensor_codec.rscrates/sn2-validator/Cargo.tomlcrates/sn2-validator/src/validator_loop/dispatch.rscrates/sn2-validator/src/validator_loop/dslice.rscrates/sn2-validator/src/validator_loop/mod.rs
The previous Arc<Circuit> rework still cloned the inner Circuit at the dispatch boundary (Some((*dslice.circuit).clone())) and silently masked msgpack decode errors via unwrap_or_default on inputs, both flagged in review. Propagate Arc<Circuit> through DispatchedRequest.task_circuit and MinerResponse.circuit so dispatch clones only the Arc handle; enable serde rc feature so the existing derives accept Arc transparently. RWR path retains a single Arc::new wrap because ensure_circuit still returns an owned Circuit (out of scope for this change). Replace unwrap_or_default on decode_msgpack_to_json with explicit Err logging keyed on uid / run_uid / slice_num / tile_idx; drop the request on decode failure rather than constructing a request with Null inputs.
Context
Mainnet profiling captured a validator running 26h:
[heap]region at 16 GB inpmapstacked_dslice_queuehealth log showed ~31k entries (each aDSliceRequestholding a clonedCircuitplus aserde_json::Valueinput tree)The ceiling was the validator itself:
dispatch_ceiling = verification_concurrency * 2 = 32in-flight tasks across the entire metagraph, anddispatch_requests()was cloning the fullNeuronInfoVec plus runningflat_map+sortover every miner's history on each call (triggered by every task/verify completion and every tick).What this changes
Queue payload shape —
DSliceRequest.circuit: Circuit→Arc<Circuit>, andinputs: serde_json::Value/outputs: Option<serde_json::Value>→bytes::Bytes/Option<Bytes>containing msgpack-encoded values built once at queue-insert time via a newinput_data_payloadhelper. The dispatch path decodes once per dispatched request (decode_msgpack_to_json) to keep the existingDSliceProofGenerationDataModelwire compat. A 30k-entry queue drops from hundreds of MB of fragmented enum trees to a few MB of contiguous blobs plus 16-byte Arc refs. DeadRequeststruct removed.Dispatch hot path — new
DispatchCache(capacities, adaptive_timeout, api_eligible) refreshed lazily with a 2 s TTL, so the per-callflat_map+sortover miner history and the snapshot HashMap clones run at most once every two seconds instead of hundreds of times per second. Per-dispatch fullNeuronInfoclone replaced with aVec<u16>of UIDs + index shuffle;NeuronInforesolved just-in-time via the existing O(1)uid_to_idxindex.spawn_miner_tasknow takes owned(ip, port, hotkey)instead of&NeuronInfo, removing the aliased self-borrow that forced the full clone.Throughput ceiling —
dispatch_ceilingbumped fromverification_concurrency * 2to* 8(32 → 128 on a 16-core box). The verification backpressure check was conflating in-flight verification (CPU-bound, must stay nearverification_concurrency) with pending-but-not-yet-verifying results (memory-bound, can buffer). They now have independent caps, so I/O fanout to miners is no longer throttled by CPU-bound proof verification draining.Deferred to follow-ups
rmpv::ValuethroughDSliceProofGenerationDataModeland miner handlers). Requires coordinated miner release.mimallocortikv-jemallocator). Re-measure RSS 24h after this lands; if[heap]still climbs after the queue compaction, do it.verification_concurrencyitself once the CPU headroom freed by the dispatch-cache changes is confirmed in production.Verification
cargo check --workspace,cargo clippy --workspace --tests -- -D warnings,cargo fmt --check,cargo test --workspace --lib,cargo build -p sn2-validator --releaseall clean.Summary by CodeRabbit
New Features
Bug Fixes & Improvements