[fast-client][thin-client] Route single-key batchGet through single-GET path'#2845
Open
eldernewborn wants to merge 7 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Optimizes 1-key batchGet / streamingBatchGet by short-circuiting to the single-GET path (fast client + thin client) while preserving response/missing-key semantics and the streaming callback contract, and adds targeted unit coverage to ensure the multi-get streaming request path is not used for 1-key cases.
Changes:
- Thin client: route single-key
streamingBatchGetthroughget()and adapt callback completion/metric tracking to match streaming expectations. - Fast client: short-circuit single-key streaming batch-get to single-GET in both dispatching and retriable layers; extend the simulator to validate/serve single-GET requests.
- Tests: add new unit tests asserting that 1-key streaming batch-get uses single-GET and validating resulting metrics/key counts.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| clients/venice-thin-client/src/test/java/com/linkedin/venice/client/store/AbstractAvroStoreClientTest.java | Adds unit test asserting single-key batchGet uses get() and not streaming multi-get. |
| clients/venice-thin-client/src/main/java/com/linkedin/venice/client/store/AbstractAvroStoreClient.java | Implements thin-client single-key streaming batch-get short-circuit + exception unwrapping helper. |
| clients/venice-client/src/test/java/com/linkedin/venice/fastclient/utils/TestClientSimulator.java | Extends simulator to distinguish/validate single-GET vs multi-get requests and build single-GET responses. |
| clients/venice-client/src/test/java/com/linkedin/venice/fastclient/BatchGetAvroStoreClientUnitTest.java | Adds unit test asserting single-key streaming batch-get uses single-GET on fast client. |
| clients/venice-client/src/main/java/com/linkedin/venice/fastclient/RetriableAvroGenericStoreClient.java | Adds single-key streaming batch-get short-circuit before batch-get retry threshold logic. |
| clients/venice-client/src/main/java/com/linkedin/venice/fastclient/InternalAvroStoreClient.java | Adds shared helper for serving single-key streaming batch-get via single-GET and normalizing exceptions. |
| clients/venice-client/src/main/java/com/linkedin/venice/fastclient/DispatchingAvroGenericStoreClient.java | Adds single-key streaming batch-get short-circuit to single-GET dispatch path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a2b7d56 to
e51b37c
Compare
…ET path For batchGet/streamingBatchGet requests containing exactly one key, issue a single-GET request instead of a multi-get streaming request. Adds streamingBatchGetForSingleKey helpers in InternalAvroStoreClient (fast client) and AbstractAvroStoreClient (thin client), with short-circuits in DispatchingAvroGenericStoreClient and RetriableAvroGenericStoreClient. Includes unit tests and TestClientSimulator support for single-get expectations.
Add code comments at the single-key short-circuit sites explaining that 1-key batchGet is now served via single-GET, which (a) follows the single-get long-tail retry config rather than the batch-get one - so a store with only batch-get retry enabled gets no long-tail retry on 1-key batch gets - and (b) shifts the server request type, affecting routing, read-quota accounting, and per-request-type metrics.
…ookups Add a new fast-client metric, batch_get_routed_to_single_get_request_count (Tehuti) / request.batch_get_routed_to_single_get_count (OTel COUNTER, dimensions store/cluster/request-method), recorded at the single-key short-circuit choke point (InternalAvroStoreClient#streamingBatchGetForSingleKey) so it counts every 1-key batchGet/streamingBatchGet served via a single-GET lookup, regardless of which layer (Retriable or Dispatching) triggered the short-circuit. Wired through the dual-backend MetricEntityStateBase pattern (built in buildFastClientOtelStats so it survives cluster-name rebuilds), consistent with retry_request_win. Adds a FastClientMetricEntity entry plus unit tests covering the Tehuti + OTel emission and the end-to-end single-key routing.
requestSubmissionWithStatsHandling
Replace non-standard {@param x} references with {@code x} in the method's
Javadoc.
…e-GET routing Pre-existing tests issued 1-key batch-gets that now route to single-GET, breaking their multi-get expectations: - BatchGetAvroStoreClientUnitTest.testSimpleStreamingBatchGettingTimeout: use 5 keys on one partition so it stays a multi-get timeout test. - DispatchingAvroGenericStoreClientTest route-blocking tests (testBatchGetWithTimeoutV4, testStreamingBatchGetToUnreachableClientV1/V2): block a replica with a 5-key request whose keys all route to that single replica (generated via DefaultVenicePartitioner), instead of a 1-key request. Decouple request_key_count.Max from numKeys in validateMetrics, since the 5-key blocking request raises the cumulative max above the follow-up 2-key request's size. - FastClientStatsFastClientTehutiMetricNameTest: register the new batch_get_routed_to_single_get_request_count Tehuti metric name.
SpotBugs and diff-coverage CI - Remove unused validateMultiGetMetrics overload flagged by SpotBugs (UPM_UNCALLED_PRIVATE_METHOD) in DispatchingAvroGenericStoreClientTest. - Add SingleKeyBatchGetAsSingleGetTest in both clients covering single-GET routing, non-existing key, error propagation, and a multi-key contrast. - Make SimpleStoreClient/VALUE_SCHEMA/valueSerializer package-private so the dedicated thin-client test can reuse them. Confirmed diffCoverage branch coverage: venice-client 58.33%, venice-thin-client 66.67% (min 0.45). SpotBugs and spotless clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test dispatched start() onto the shared ForkJoinPool.commonPool() via CompletableFuture.runAsync. start() blocks indefinitely on isReadyLatch.await() in this all-failures scenario, and under CI load the common-pool task could sit queued past the 3s verification timeout, intermittently failing the first verify(updateCache(false)). Run start() on a dedicated daemon thread so it begins promptly regardless of pool load, and deterministically tear it down (interrupt + join + assert it exited) instead of leaking a blocked thread until JVM teardown. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4be6732 to
f2661f4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
batchGet/streamingBatchGetrequests always go through the multi-get streaming path, even when the caller passes exactly one key. For a 1-key request this is wasteful and inconsistent: it incurs the full multi-get machinery (scatter/gather, streaming response assembly, batch-get routing) to fetch a single value that the single-GET path already serves more directly. There was no way to distinguish or measure these degenerate single-key batch requests, which are common in practice when callers build key sets dynamically.Solution
For
batchGet/streamingBatchGetrequests containing exactly one key, short-circuit to a single-GET lookup instead of issuing a multi-get streaming request. This is implemented in both the fast client and the thin client:streamingBatchGetForSingleKeyhelpers inInternalAvroStoreClient(fast client) andAbstractAvroStoreClient(thin client). Each issues aget(key), adapts the result back onto theStreamingCallback(onRecordReceived+onCompletion), and unwrapsCompletionExceptionso callers see the original cause.DispatchingAvroGenericStoreClientandRetriableAvroGenericStoreClient(fast client) detectkeys.size() == 1and route through the single-key helper, so the optimization applies regardless of which layer triggers it.batch_get_routed_to_single_get_request_count(Tehuti) /request.batch_get_routed_to_single_get_count(OTel COUNTER, dimensions store/cluster/request-method), recorded at the single-key short-circuit choke point so every 1-key request served via single-GET is counted exactly once. It is wired through the dual-backendMetricEntityStateBasepattern (built inbuildFastClientOtelStatsso it survives cluster-name rebuilds), consistent withretry_request_win.Trade-off worth calling out: a store that enabled only batch-get long-tail retry will get no long-tail retry on 1-key batch gets, since those now follow the single-get retry config.
Testing: added dedicated
SingleKeyBatchGetAsSingleGetTestin both clients (single-GET routing, non-existing key, error propagation, multi-key contrast), extendedTestClientSimulatorwith single-get expectation support, and fixed pre-existing batch-get tests that issued 1-key requests by switching them to 5-key requests so they remain genuine multi-get tests. ConfirmeddiffCoveragebranch coverage (venice-client 58.33%, venice-thin-client 66.67%, min 0.45); SpotBugs and spotless clean.Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
getfuture viawhenComplete; introduces no shared mutable state.)synchronized,RWLock) are used where needed. (None needed.)ConcurrentHashMap,CopyOnWriteArrayList). (N/A — no new collections.)whenCompletecallback unwraps and forwards exceptions through the callback;CompletionExceptionis unwrapped to the original cause.)How was this PR tested?
SingleKeyBatchGetAsSingleGetTestin both fast and thin clients; newFastClientStats/FastClientMetricEntitymetric tests.)BatchGetAvroStoreClientUnitTest,DispatchingAvroGenericStoreClientTest,RequestBasedMetadataTest,FastClientStatsFastClientTehutiMetricNameTest, andTestClientSimulatorto account for single-GET routing.)batchGet/streamingBatchGetare unchanged; only the underlying request path differs.)Does this PR introduce any user-facing or breaking changes?
For
batchGet/streamingBatchGetwith exactly one key, the client now issues a single-GET request instead of a multi-get streaming request. The returned values/map are unchanged, but observable behavior shifts:batch_get_routed_to_single_get_request_count/request.batch_get_routed_to_single_get_countis emitted for each such routed request.