Skip to content

[fast-client][thin-client] Route single-key batchGet through single-GET path'#2845

Open
eldernewborn wants to merge 7 commits into
linkedin:mainfrom
eldernewborn:eldernewborn/single-key-batch-get-optimization
Open

[fast-client][thin-client] Route single-key batchGet through single-GET path'#2845
eldernewborn wants to merge 7 commits into
linkedin:mainfrom
eldernewborn:eldernewborn/single-key-batch-get-optimization

Conversation

@eldernewborn

@eldernewborn eldernewborn commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Problem Statement

batchGet / streamingBatchGet requests always go through the multi-get streaming path, even when the caller passes exactly one key. For a 1-key request this is wasteful and inconsistent: it incurs the full multi-get machinery (scatter/gather, streaming response assembly, batch-get routing) to fetch a single value that the single-GET path already serves more directly. There was no way to distinguish or measure these degenerate single-key batch requests, which are common in practice when callers build key sets dynamically.

Solution

For batchGet / streamingBatchGet requests containing exactly one key, short-circuit to a single-GET lookup instead of issuing a multi-get streaming request. This is implemented in both the fast client and the thin client:

  • Add streamingBatchGetForSingleKey helpers in InternalAvroStoreClient (fast client) and AbstractAvroStoreClient (thin client). Each issues a get(key), adapts the result back onto the StreamingCallback (onRecordReceived + onCompletion), and unwraps CompletionException so callers see the original cause.
  • Short-circuit at the entry points: DispatchingAvroGenericStoreClient and RetriableAvroGenericStoreClient (fast client) detect keys.size() == 1 and route through the single-key helper, so the optimization applies regardless of which layer triggers it.
  • Add a new fast-client metric, batch_get_routed_to_single_get_request_count (Tehuti) / request.batch_get_routed_to_single_get_count (OTel COUNTER, dimensions store/cluster/request-method), recorded at the single-key short-circuit choke point so every 1-key request served via single-GET is counted exactly once. It is wired through the dual-backend MetricEntityStateBase pattern (built in buildFastClientOtelStats so it survives cluster-name rebuilds), consistent with retry_request_win.
  • Document the behavior change at the short-circuit sites: a 1-key batch get now follows the single-get long-tail retry configuration rather than the batch-get one, and the server sees a single-get request type instead of a multi-get — which shifts server-side routing, read-quota accounting (single-get and batch-get quotas are throttled separately), and per-request-type server metrics for these requests.

Trade-off worth calling out: a store that enabled only batch-get long-tail retry will get no long-tail retry on 1-key batch gets, since those now follow the single-get retry config.

Testing: added dedicated SingleKeyBatchGetAsSingleGetTest in both clients (single-GET routing, non-existing key, error propagation, multi-key contrast), extended TestClientSimulator with single-get expectation support, and fixed pre-existing batch-get tests that issued 1-key requests by switching them to 5-key requests so they remain genuine multi-get tests. Confirmed diffCoverage branch coverage (venice-client 58.33%, venice-thin-client 66.67%, min 0.45); SpotBugs and spotless clean.

Code changes

  • Added new code behind a config. (No new config; behavior is unconditional for 1-key requests.)
  • Introduced new log lines. (No new log lines — only a new metric.)
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues. (Single-key helper composes on the existing get future via whenComplete; introduces no shared mutable state.)
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed. (None needed.)
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList). (N/A — no new collections.)
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination. (whenComplete callback unwraps and forwards exceptions through the callback; CompletionException is unwrapped to the original cause.)

How was this PR tested?

  • New unit tests added. (SingleKeyBatchGetAsSingleGetTest in both fast and thin clients; new FastClientStats / FastClientMetricEntity metric tests.)
  • New integration tests added.
  • Modified or extended existing tests. (Updated BatchGetAvroStoreClientUnitTest, DispatchingAvroGenericStoreClientTest, RequestBasedMetadataTest, FastClientStatsFastClientTehutiMetricNameTest, and TestClientSimulator to account for single-GET routing.)
  • Verified backward compatibility (if applicable). (Result semantics of 1-key batchGet/streamingBatchGet are unchanged; only the underlying request path differs.)

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

For batchGet / streamingBatchGet with exactly one key, the client now issues a single-GET request instead of a multi-get streaming request. The returned values/map are unchanged, but observable behavior shifts:

  • Retry config: these requests follow the single-get long-tail retry configuration rather than the batch-get one. A store that enabled only batch-get long-tail retry will get no long-tail retry on 1-key batch gets.
  • Server-side: the server sees a single-get request type, changing routing, read-quota accounting (single-get and batch-get quotas are throttled separately), and per-request-type server metrics for these 1-key requests.
  • Client metrics: a new metric batch_get_routed_to_single_get_request_count / request.batch_get_routed_to_single_get_count is emitted for each such routed request.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes 1-key batchGet / streamingBatchGet by short-circuiting to the single-GET path (fast client + thin client) while preserving response/missing-key semantics and the streaming callback contract, and adds targeted unit coverage to ensure the multi-get streaming request path is not used for 1-key cases.

Changes:

  • Thin client: route single-key streamingBatchGet through get() and adapt callback completion/metric tracking to match streaming expectations.
  • Fast client: short-circuit single-key streaming batch-get to single-GET in both dispatching and retriable layers; extend the simulator to validate/serve single-GET requests.
  • Tests: add new unit tests asserting that 1-key streaming batch-get uses single-GET and validating resulting metrics/key counts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
clients/venice-thin-client/src/test/java/com/linkedin/venice/client/store/AbstractAvroStoreClientTest.java Adds unit test asserting single-key batchGet uses get() and not streaming multi-get.
clients/venice-thin-client/src/main/java/com/linkedin/venice/client/store/AbstractAvroStoreClient.java Implements thin-client single-key streaming batch-get short-circuit + exception unwrapping helper.
clients/venice-client/src/test/java/com/linkedin/venice/fastclient/utils/TestClientSimulator.java Extends simulator to distinguish/validate single-GET vs multi-get requests and build single-GET responses.
clients/venice-client/src/test/java/com/linkedin/venice/fastclient/BatchGetAvroStoreClientUnitTest.java Adds unit test asserting single-key streaming batch-get uses single-GET on fast client.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/RetriableAvroGenericStoreClient.java Adds single-key streaming batch-get short-circuit before batch-get retry threshold logic.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/InternalAvroStoreClient.java Adds shared helper for serving single-key streaming batch-get via single-GET and normalizing exceptions.
clients/venice-client/src/main/java/com/linkedin/venice/fastclient/DispatchingAvroGenericStoreClient.java Adds single-key streaming batch-get short-circuit to single-GET dispatch path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eldernewborn eldernewborn force-pushed the eldernewborn/single-key-batch-get-optimization branch from a2b7d56 to e51b37c Compare June 3, 2026 17:45
@eldernewborn eldernewborn requested a review from FelixGV June 5, 2026 02:22
Copilot AI and others added 7 commits June 4, 2026 20:39
…ET path

For batchGet/streamingBatchGet requests containing exactly one key, issue a
single-GET request instead of a multi-get streaming request. Adds
streamingBatchGetForSingleKey helpers in InternalAvroStoreClient (fast client)
and AbstractAvroStoreClient (thin client), with short-circuits in
DispatchingAvroGenericStoreClient and RetriableAvroGenericStoreClient.

Includes unit tests and TestClientSimulator support for single-get
expectations.
Add code comments at the single-key short-circuit sites explaining that 1-key
batchGet is now served via single-GET, which (a) follows the single-get
long-tail
retry config rather than the batch-get one - so a store with only batch-get
retry
enabled gets no long-tail retry on 1-key batch gets - and (b) shifts the
server
request type, affecting routing, read-quota accounting, and per-request-type
metrics.
…ookups

Add a new fast-client metric, batch_get_routed_to_single_get_request_count
(Tehuti) / request.batch_get_routed_to_single_get_count (OTel COUNTER,
dimensions
store/cluster/request-method), recorded at the single-key short-circuit choke
point
(InternalAvroStoreClient#streamingBatchGetForSingleKey) so it counts every
1-key
batchGet/streamingBatchGet served via a single-GET lookup, regardless of
which layer
(Retriable or Dispatching) triggered the short-circuit.

Wired through the dual-backend MetricEntityStateBase pattern (built in
buildFastClientOtelStats so it survives cluster-name rebuilds), consistent
with
retry_request_win. Adds a FastClientMetricEntity entry plus unit tests
covering the
Tehuti + OTel emission and the end-to-end single-key routing.
requestSubmissionWithStatsHandling

Replace non-standard {@param x} references with {@code x} in the method's
Javadoc.
…e-GET

routing

Pre-existing tests issued 1-key batch-gets that now route to single-GET,
breaking
their multi-get expectations:
- BatchGetAvroStoreClientUnitTest.testSimpleStreamingBatchGettingTimeout: use
5 keys
  on one partition so it stays a multi-get timeout test.
- DispatchingAvroGenericStoreClientTest route-blocking tests
(testBatchGetWithTimeoutV4,
  testStreamingBatchGetToUnreachableClientV1/V2): block a replica with a
5-key request
  whose keys all route to that single replica (generated via
DefaultVenicePartitioner),
  instead of a 1-key request. Decouple request_key_count.Max from numKeys in
  validateMetrics, since the 5-key blocking request raises the cumulative max
above the
  follow-up 2-key request's size.
- FastClientStatsFastClientTehutiMetricNameTest: register the new
  batch_get_routed_to_single_get_request_count Tehuti metric name.
SpotBugs and diff-coverage CI

- Remove unused validateMultiGetMetrics overload flagged by SpotBugs
  (UPM_UNCALLED_PRIVATE_METHOD) in DispatchingAvroGenericStoreClientTest.
- Add SingleKeyBatchGetAsSingleGetTest in both clients covering single-GET
  routing, non-existing key, error propagation, and a multi-key contrast.
- Make SimpleStoreClient/VALUE_SCHEMA/valueSerializer package-private so the
  dedicated thin-client test can reuse them.

Confirmed diffCoverage branch coverage: venice-client 58.33%,
venice-thin-client 66.67% (min 0.45). SpotBugs and spotless clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test dispatched start() onto the shared ForkJoinPool.commonPool() via
CompletableFuture.runAsync. start() blocks indefinitely on
isReadyLatch.await()
in this all-failures scenario, and under CI load the common-pool task could
sit
queued past the 3s verification timeout, intermittently failing the first
verify(updateCache(false)).

Run start() on a dedicated daemon thread so it begins promptly regardless of
pool load, and deterministically tear it down (interrupt + join + assert it
exited) instead of leaking a blocked thread until JVM teardown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eldernewborn eldernewborn force-pushed the eldernewborn/single-key-batch-get-optimization branch from 4be6732 to f2661f4 Compare June 5, 2026 03:42
@eldernewborn eldernewborn changed the title Eldernewborn/single key batch get optimization [fast-client][thin-client] Route single-key batchGet through single-GET path' Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants