Investigate validator throughput ceiling and reshape dispatch hot path by HudsonGraeme · Pull Request #494 · inference-labs-inc/subnet-2

HudsonGraeme · 2026-05-13T15:09:46Z

Context

Mainnet profiling captured a validator running 26h:

one main-loop thread pinned at 100%, fifteen tokio workers idle (load avg ~1 on a 16-core box, 94% CPU idle overall)
26 GB RSS + 11 GB swap; [heap] region at 16 GB in pmap
stacked_dslice_queue health log showed ~31k entries (each a DSliceRequest holding a cloned Circuit plus a serde_json::Value input tree)
throughput ~17 proofs/sec against ~243 queryable miners; ~30% of miner queries failing with reconnect-in-flight transport errors

The ceiling was the validator itself: dispatch_ceiling = verification_concurrency * 2 = 32 in-flight tasks across the entire metagraph, and dispatch_requests() was cloning the full NeuronInfo Vec plus running flat_map+sort over every miner's history on each call (triggered by every task/verify completion and every tick).

What this changes

Queue payload shape — DSliceRequest.circuit: Circuit → Arc<Circuit>, and inputs: serde_json::Value / outputs: Option<serde_json::Value> → bytes::Bytes / Option<Bytes> containing msgpack-encoded values built once at queue-insert time via a new input_data_payload helper. The dispatch path decodes once per dispatched request (decode_msgpack_to_json) to keep the existing DSliceProofGenerationDataModel wire compat. A 30k-entry queue drops from hundreds of MB of fragmented enum trees to a few MB of contiguous blobs plus 16-byte Arc refs. Dead Request struct removed.

Dispatch hot path — new DispatchCache (capacities, adaptive_timeout, api_eligible) refreshed lazily with a 2 s TTL, so the per-call flat_map+sort over miner history and the snapshot HashMap clones run at most once every two seconds instead of hundreds of times per second. Per-dispatch full NeuronInfo clone replaced with a Vec<u16> of UIDs + index shuffle; NeuronInfo resolved just-in-time via the existing O(1) uid_to_idx index. spawn_miner_task now takes owned (ip, port, hotkey) instead of &NeuronInfo, removing the aliased self-borrow that forced the full clone.

Throughput ceiling — dispatch_ceiling bumped from verification_concurrency * 2 to * 8 (32 → 128 on a 16-core box). The verification backpressure check was conflating in-flight verification (CPU-bound, must stay near verification_concurrency) with pending-but-not-yet-verifying results (memory-bound, can buffer). They now have independent caps, so I/O fanout to miners is no longer throttled by CPU-bound proof verification draining.

Deferred to follow-ups

Eliminate the dispatch-time msgpack→JSON transcode (push rmpv::Value through DSliceProofGenerationDataModel and miner handlers). Requires coordinated miner release.
Custom global allocator (mimalloc or tikv-jemallocator). Re-measure RSS 24h after this lands; if [heap] still climbs after the queue compaction, do it.
Lift verification_concurrency itself once the CPU headroom freed by the dispatch-cache changes is confirmed in production.

Verification

cargo check --workspace, cargo clippy --workspace --tests -- -D warnings, cargo fmt --check, cargo test --workspace --lib, cargo build -p sn2-validator --release all clean.

Summary by CodeRabbit

New Features
- MessagePack support for tensor inputs and validation against optional input schemas
- Exposed utilities for encoding/decoding MessagePack payloads
Bug Fixes & Improvements
- Switched request payloads from JSON to compact binary format for lower overhead
- Request dispatch caching and improved miner selection with adaptive timeout handling

Profiling on mainnet showed one main-loop thread saturated at 100% while fifteen tokio workers idled, a thirty-thousand-entry dslice queue materialised as serde_json::Value trees with by-value Circuit clones, and in-flight task fanout capped at thirty-two against ~240 queryable miners. Replace queue payloads with Arc<Circuit> and msgpack bytes so each entry collapses from kilobytes of nested Value enums to a single contiguous allocation. Introduce a DispatchCache that memoises miner capacities, adaptive timeout, and the api-eligible set with a two-second TTL so the per-call flat_map+sort and HashMap clones no longer pin a core. Replace the per-dispatch full NeuronInfo clone with a Vec<u16> of UIDs plus just-in-time lookup through the existing uid_to_idx index. Raise the dispatch ceiling from 2x to 8x verification_concurrency and decouple the pending_verifications cap from the verify_tasks cap so I/O fanout is no longer gated by CPU-bound proof verification.

coderabbitai · 2026-05-13T15:10:01Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a9857c8c-2462-42e7-bd74-150073d52644

📥 Commits

Reviewing files that changed from the base of the PR and between 3bbc8ee and 69dbb71.

📒 Files selected for processing (4)

Cargo.toml
crates/sn2-types/src/miner_response.rs
crates/sn2-validator/src/validator_loop/dispatch.rs
crates/sn2-validator/src/validator_loop/mod.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/sn2-validator/src/validator_loop/dispatch.rs

Walkthrough

This PR migrates tensor inputs from JSON to MessagePack bytes, adds MessagePack tensor codec helpers and Circuit msgpack validation, updates request types to use bytes::Bytes and Arc<Circuit>, refactors the dslice staging/dispatch pipeline to use binary payloads, and adds a dispatch cache used during miner selection.

Changes

MessagePack codec and DSlice byte-oriented pipeline

Layer / File(s)	Summary
MessagePack codec foundation and dependencies `Cargo.toml`, `crates/sn2-types/Cargo.toml`, `crates/sn2-validator/Cargo.toml`, `crates/sn2-types/src/tensor_codec.rs`, `crates/sn2-types/src/lib.rs`, `crates/sn2-types/src/circuit.rs`	Adds `rmpv`, `rmp-serde`, and `bytes` deps; implements `arrayd_to_msgpack_value`, `encode_msgpack_value`, `input_data_payload`, `decode_msgpack_value`, `decode_msgpack_to_json`; re-exports codec helpers and adds `Circuit::validate_inputs_msgpack`.
DSliceRequest and MinerResponse type migration to bytes and Arc `crates/sn2-types/src/request.rs`, `crates/sn2-types/src/miner_response.rs`, `Cargo.toml`	Removes old `Request`; updates `DSliceRequest` to use `Arc<Circuit>`, `bytes::Bytes` for inputs/outputs and drops Serde derives; updates `MinerResponse.circuit` to `Option<Arc<Circuit>>`; enables `serde` `rc` feature in workspace.
DSlice submission pipeline: Arc and bytes migration `crates/sn2-validator/src/validator_loop/dslice.rs`	Refactors enqueue/staging to accept `&Arc<Circuit>`, wraps circuits in `Arc`, converts tiled and non-tiled tile payloads to `bytes::Bytes` via `input_data_payload`, and updates tile request builder and benchmark enqueue call sites.
Dispatch cache and selection adjustments `crates/sn2-validator/src/validator_loop/dispatch.rs`, `crates/sn2-validator/src/validator_loop/mod.rs`	Adds `DispatchCache` with TTL, capacity lookup, adaptive timeout, and `api_eligible` set; refreshes cache when stale; decodes msgpack inputs to JSON during dispatch selection; refactors `spawn_miner_task` signature; removes old neuron-based `compute_api_eligible`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

inference-labs-inc/subnet-2#463: Overlaps crates/sn2-validator/src/validator_loop/dslice.rs changes to enqueue_all_dslices and tile request construction.
inference-labs-inc/subnet-2#415: Also modifies dslice tiled staging/queueing logic in crates/sn2-validator/src/validator_loop/dslice.rs.
inference-labs-inc/subnet-2#430: Overlaps tiled staging flow changes affecting stage_tiled_work/tile preparation.

Suggested labels

run-build

Poem

🐰 Hopping bytes instead of JSON down the lane,
Arc-wrapped circuits share without the strain,
Msgpack carrots packed—compact and neat,
Dispatch cache hums, keeping miners fleet,
A rabbit cheers: small bytes, big gain!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the PR's main objective: investigating validator throughput bottlenecks and optimizing the dispatch hot path with caching, binary payloads, and higher concurrency ceilings.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch investigate/validator-bottlenecks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/sn2-validator/src/validator_loop/dispatch.rs (1)
265-266: 🏗️ Heavy lift

Avoid cloning the full Circuit back into each dispatched dslice.

Arc<Circuit> shrinks the queue, but Some((*dslice.circuit).clone()) reintroduces one full circuit allocation per in-flight request. With the higher dispatch ceiling, that can eat into the memory win from this PR. If the verification path can accept shared ownership, carry Arc<Circuit> through DispatchedRequest/MinerResponse instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sn2-validator/src/validator_loop/dispatch.rs` around lines 265 - 266,
The code is cloning the full Circuit into each dispatched dslice via
Some((*dslice.circuit).clone()), undoing the Arc memory benefit; change the
field types for task_circuit in DispatchedRequest and MinerResponse to
Option<Arc<Circuit>> and stop cloning the inner Circuit—set task_circuit to
Some(dslice.circuit.clone()) (cloning the Arc, not the Circuit) and update all
downstream consumers (verification path) to accept Arc<Circuit> shared ownership
instead of owned Circuit so no full Circuit allocations are reintroduced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sn2-validator/src/validator_loop/dispatch.rs`:
- Around line 226-230: The call to sn2_types::decode_msgpack_to_json currently
uses unwrap_or_default on dslice.inputs which hides decoding errors; replace
that with explicit error handling: call decode_msgpack_to_json(&dslice.inputs)
and match the Result, and on Err log or propagate the decoding error (including
the error details and uid/dslice id) and drop/fail the request instead of
constructing dslice_model and calling self.pipeline.prepare_dslice_request; only
continue to call prepare_dslice_request on Ok(inputs_json). Ensure references in
the fix are to decode_msgpack_to_json, dslice.inputs, prepare_dslice_request and
remove the unwrap_or_default usage.

---

Nitpick comments:
In `@crates/sn2-validator/src/validator_loop/dispatch.rs`:
- Around line 265-266: The code is cloning the full Circuit into each dispatched
dslice via Some((*dslice.circuit).clone()), undoing the Arc memory benefit;
change the field types for task_circuit in DispatchedRequest and MinerResponse
to Option<Arc<Circuit>> and stop cloning the inner Circuit—set task_circuit to
Some(dslice.circuit.clone()) (cloning the Arc, not the Circuit) and update all
downstream consumers (verification path) to accept Arc<Circuit> shared ownership
instead of owned Circuit so no full Circuit allocations are reintroduced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e08dd6e5-d208-44d2-b16b-68347da1eae3

📥 Commits

Reviewing files that changed from the base of the PR and between 4926e40 and 3bbc8ee.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

Cargo.toml
crates/sn2-types/Cargo.toml
crates/sn2-types/src/circuit.rs
crates/sn2-types/src/lib.rs
crates/sn2-types/src/request.rs
crates/sn2-types/src/tensor_codec.rs
crates/sn2-validator/Cargo.toml
crates/sn2-validator/src/validator_loop/dispatch.rs
crates/sn2-validator/src/validator_loop/dslice.rs
crates/sn2-validator/src/validator_loop/mod.rs

The previous Arc<Circuit> rework still cloned the inner Circuit at the dispatch boundary (Some((*dslice.circuit).clone())) and silently masked msgpack decode errors via unwrap_or_default on inputs, both flagged in review. Propagate Arc<Circuit> through DispatchedRequest.task_circuit and MinerResponse.circuit so dispatch clones only the Arc handle; enable serde rc feature so the existing derives accept Arc transparently. RWR path retains a single Arc::new wrap because ensure_circuit still returns an owned Circuit (out of scope for this change). Replace unwrap_or_default on decode_msgpack_to_json with explicit Err logging keyed on uid / run_uid / slice_num / tile_idx; drop the request on decode failure rather than constructing a request with Null inputs.

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread crates/sn2-validator/src/validator_loop/dispatch.rs Outdated

HudsonGraeme merged commit 0291f33 into testnet May 13, 2026
18 checks passed

HudsonGraeme deleted the investigate/validator-bottlenecks branch May 13, 2026 16:43

HudsonGraeme mentioned this pull request May 13, 2026

Promote testnet to main for 14.8.3 release #496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate validator throughput ceiling and reshape dispatch hot path#494

Investigate validator throughput ceiling and reshape dispatch hot path#494
HudsonGraeme merged 2 commits into
testnetfrom
investigate/validator-bottlenecks

HudsonGraeme commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HudsonGraeme commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What this changes

Deferred to follow-ups

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HudsonGraeme commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading