[server][da-vinci][router][fast-client] Introduce self-contained control-plane signal stats by m-nagarajan · Pull Request #2839 · linkedin/venice

m-nagarajan · 2026-06-01T09:03:46Z

Problem Statement

Control-plane signals used by Venice's adaptive throttler, Helix group
routing, and server load controller all read their values from Tehuti
sensors via MetricsRepository.getMetric(...).value(). This creates a
hard dependency on Tehuti being enabled: if Tehuti is removed, these
signals silently return zero and throttling/routing/load-shedding
decisions break.

Solution

Introduces a use.self.contained.stats config in VeniceMetricsConfig
(default false) that routes the three control signals to
library-independent implementations when enabled. Existing Tehuti paths
are completely unchanged when the config is false.

New classes (venice-client-common/utils/concurrent):

LatencyPercentileProvider — lock-free ring-buffer reservoir (4096
slots per LatencyType) for the adaptive throttler's p99 read-latency
signal. Write: O(1) atomic slot write. Read: O(n log n) sort of 4096
longs, called only every 30s by the signal refresh scheduler.
SlidingWindowAverage — dual-bucket sliding window for the per-group
routing average latency. Write: O(1) LongAdder increment. Read:
O(cells) LongAdder.sum() merge, cached behind a single volatile
CachedAverage reference for 100ms.

LoadController independent path: replaces SlidingWindowCount
with two plain LongAdders using sumThenReset(). The effective window
becomes rejectionRatioUpdateIntervalInSec (the existing cached
measurement cadence), making windowSizeInSec irrelevant for this path.

Wiring: all three components read the flag through
VeniceMetricsConfig.useSelfContainedStats(metricsRepository) — one
static helper call, no per-component config keys.

Code changes

Added new code behind a config: use.self.contained.stats in
VeniceMetricsConfig (default false). When false, zero behavior
change anywhere.
Introduced new log lines.

Concurrency-Specific Checks

Code has no race conditions or thread safety issues.
LatencyPercentileProvider writes are lock-free (AtomicLongArray);
concurrent reads tolerate a few in-flight writes for an approximate
signal. SlidingWindowAverage uses double-checked locking for bucket
rotation and a single volatile reference for cache atomicity.
Proper synchronization mechanisms are used where needed.
No blocking calls inside critical sections.
Thread-safe collections used (VeniceConcurrentHashMap).
Proper exception handling in multi-threaded code.

How was this PR tested?

New unit tests added: LatencyPercentileProviderTest,
SlidingWindowAverageTest, SlidingWindowCountTest.
Modified or extended existing tests: LoadControllerTest (both
counter paths parameterized), HelixGroupStatsTest, AdaptiveThrottlerSignalServiceTest.
Parity tests validate that new implementations produce equivalent
results to the Tehuti counterparts for the same input sequences.
Verified backward compatibility: all tests pass with default
use.self.contained.stats=false.

Does this PR introduce any user-facing or breaking changes?

No. You can skip the rest of this section.

control-plane signal stats Adds USE_SELF_CONTAINED_STATS config in VeniceMetricsConfig to replace Tehuti-backed control signals with library-independent implementations, keeping throttling, routing, and load-shedding decisions functional when the Tehuti dependency is removed. Default false — all existing Tehuti paths are unchanged. New classes (venice-client-common/utils/concurrent): - LatencyPercentileProvider: lock-free ring-buffer reservoir (4096 slots per LatencyType) for adaptive throttler p99 signal. Moved from da-vinci.kafka.consumer to be accessible to all components. - SlidingWindowAverage: dual-bucket sliding window for per-group routing avg with single-volatile-ref cache for per-request read performance. LoadController independent path: two LongAdders with sumThenReset() replacing SlidingWindowCount; window = rejectionRatioUpdateIntervalInSec. HelixGroupStats/Selector/RoutingStrategy: useSelfContainedStats flag selects SlidingWindowAverage (true) or Tehuti Avg metric (false). ServerHttpRequestStats: observeLatencyForSignal() feeds the provider when enabled; no-op null-check guard when disabled (zero overhead).

Copilot

Pull request overview

This PR adds an opt-in (use.self.contained.stats, default false) path to source key control-plane signals (adaptive throttler read-latency p99, Helix group routing avg latency, and server load-controller request/accept counts) from Venice-owned data structures instead of reading Tehuti sensor values via MetricsRepository.getMetric(...).value(). The intent is to keep throttling/routing/load-shedding decisions functional even if Tehuti is disabled/removed, while preserving existing behavior when the flag is off.

Changes:

Introduces self-contained signal primitives (LatencyPercentileProvider, SlidingWindowAverage) and wires them into server/router/fast-client paths behind VeniceMetricsConfig.useSelfContainedStats(metricsRepository).
Updates LoadController to support an independent counter backend (two LongAdders drained per update interval) and updates server handler wiring accordingly.
Adds/updates unit tests to validate correctness and parity across Tehuti vs self-contained paths.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
services/venice-server/src/test/java/com/linkedin/venice/listener/ServerLoadControllerHandlerTest.java	Updates handler construction to pass a `MetricsRepository`.
services/venice-server/src/main/java/com/linkedin/venice/stats/ServerHttpRequestStats.java	Adds optional latency-sample sink to feed adaptive-throttler p99 provider.
services/venice-server/src/main/java/com/linkedin/venice/stats/AggServerHttpRequestStats.java	Plumbs `LatencyPercentileProvider` through stats supplier/constructors.
services/venice-server/src/main/java/com/linkedin/venice/server/VeniceServer.java	Wires adaptive-throttler provider into listener initialization.
services/venice-server/src/main/java/com/linkedin/venice/listener/ServerLoadControllerHandler.java	Enables independent load-controller counter path based on `use.self.contained.stats`.
services/venice-server/src/main/java/com/linkedin/venice/listener/ListenerService.java	Adds constructor overload to pass `LatencyPercentileProvider` into channel initializer.
services/venice-server/src/main/java/com/linkedin/venice/listener/HttpChannelInitializer.java	Plumbs provider into request stats + passes repo into load-controller handler.
services/venice-router/src/main/java/com/linkedin/venice/router/api/routing/helix/HelixGroupSelector.java	Enables self-contained per-group avg read path for routing stats.
internal/venice-test-common/src/integrationTest/java/com/linkedin/venice/integration/utils/TestVeniceServer.java	Updates overridden `createListenerService` signature to accept provider.
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java	Notes config is defined cross-cutting in `VeniceMetricsConfig`.
internal/venice-client-common/src/test/java/com/linkedin/venice/utils/concurrent/SlidingWindowAverageTest.java	Adds unit tests for `SlidingWindowAverage` behavior and Tehuti parity.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/routing/HelixGroupStatsTest.java	Parameterizes tests for Tehuti vs self-contained avg read path + parity test.
internal/venice-client-common/src/test/java/com/linkedin/venice/reliability/LoadControllerTest.java	Parameterizes tests for Tehuti vs independent counter backends + parity test.
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/concurrent/SlidingWindowAverage.java	Adds self-contained sliding-window average with cached reads.
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/concurrent/LatencyPercentileProvider.java	Adds self-contained ring-buffer reservoir p99 provider.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/VeniceMetricsConfig.java	Adds `use.self.contained.stats` config + static helper to read it from `VeniceMetricsRepository`.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/routing/HelixGroupStats.java	Adds self-contained per-group avg path backed by `SlidingWindowAverage`.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/OpenTelemetryMetricsSetup.java	Threads `useSelfContainedStats` into setup info for downstream consumers.
internal/venice-client-common/src/main/java/com/linkedin/venice/reliability/LoadController.java	Adds independent-counter backend option (LongAdder) alongside Tehuti windowed counts.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/meta/HelixGroupRoutingStrategy.java	Uses cross-cutting flag to select Helix group avg latency read path.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/kafka/consumer/LatencyPercentileProviderTest.java	Adds unit tests for reservoir percentile behavior + concurrency.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/kafka/consumer/AdaptiveThrottlerSignalServiceTest.java	Splits tests by Tehuti vs provider path and adds parity coverage.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/AdaptiveThrottlerSignalService.java	Adds self-contained p99 provider path controlled by cross-cutting flag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      if (hostSingleGetLatencyP99Metric != null) {
+        hostSingleGetLatencyP99 = hostSingleGetLatencyP99Metric.value();
+      }
+      if (hostMultiGetLatencyP99Metric != null) {
+        hostMultiGetLatencyP99 = hostMultiGetLatencyP99Metric.value();
+      }
+      if (hostReadComputeLatencyP99Metric != null) {
+        hostReadComputeLatencyP99 = hostReadComputeLatencyP99Metric.value();
+      }
+      applyLatencySignals(hostSingleGetLatencyP99, hostMultiGetLatencyP99, hostReadComputeLatencyP99);


Copilot AI review requested due to automatic review settings June 1, 2026 09:03

Copilot started reviewing on behalf of m-nagarajan June 1, 2026 09:04 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server][da-vinci][router][fast-client] Introduce self-contained control-plane signal stats#2839

[server][da-vinci][router][fast-client] Introduce self-contained control-plane signal stats#2839
m-nagarajan wants to merge 1 commit into
linkedin:mainfrom
m-nagarajan:mnagaraj/replaceTehutiWithBaseClasses

m-nagarajan commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m-nagarajan commented Jun 1, 2026

Problem Statement

Solution

Code changes

Concurrency-Specific Checks

How was this PR tested?

Does this PR introduce any user-facing or breaking changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants