Skip to content

[server][dvc] Fail-fast on blob-transfer PartitionState/StoreVersionState schema-version mismatch#2811

Open
jingy-li wants to merge 12 commits into
linkedin:mainfrom
jingy-li:partition-version-check
Open

[server][dvc] Fail-fast on blob-transfer PartitionState/StoreVersionState schema-version mismatch#2811
jingy-li wants to merge 12 commits into
linkedin:mainfrom
jingy-li:partition-version-check

Conversation

@jingy-li

Copy link
Copy Markdown
Contributor

Problem Statement

P2P blob transfer ships RocksDB snapshot files plus a BlobTransferPartitionMetadata payload that embeds Avro-serialized
PartitionState (offset record) and StoreVersionState. Today the two peers do not negotiate the protocol versions they used to serialize that metadata, the receiver only discovers a mismatch when it tries to deserialize the body, which is:

  • Late: the client has already paid for every file byte by the time metadata is parsed, so a doomed transfer holds the bootstrap slot until per-host receive timeout.
  • Indistinguishable from real corruption: the failure surfaces as a generic Avro deserialization error rather than a typed signal the orchestrator can use to fall over to the next peer / Kafka bootstrap.

This is the fast path during rolling deploys, where peer-vs-local binary skew on PartitionState / StoreVersionState versions is exactly when this misfires.

Solution

Both sides (client/server) advertise their compiled-in AvroProtocolDefinition.{PARTITION_STATE,STORE_VERSION_STATE}.getCurrentProtocolVersion() and check at the earliest possible point:

  • Client → server sets X-Blob-Transfer-Partition-State-Schema-Version / …-Store-Version-State-Schema-Version on the GET in NettyFileTransferClient.
  • Server-side gate (P2PFileTransferServerHandler) compares against local versions next to the existing table-format check, before any file work begins. On mismatch it returns 400 BAD_REQUEST with marker header X-Blob-Transfer-Schema-Mismatch: true plus the server's own versions echoed back.
  • Server → client stamps the same two version headers on the metadata response.
  • Client-side gate (P2PFileTransferClientHandler) validates the metadata response headers at header-parse time (before consuming the body) and on the 400 error path, throwing the new typed VeniceBlobTransferIncompatibleSchemaException (in venice-client-common). The exception carries peer host + peer/local versions for both protocols so the orchestrator can log/route on it.

Policy is exact equality (not "peer ≤ local"): blob transfer is the fast path; on skew we'd rather step aside to Kafka bootstrap than rely on cross-version Avro promotion of partition metadata.

Rolling-deploy compatibility is intentional:

  • Both headers absent → pass through (peer is on an older binary that doesn't emit them).
  • One header present, one absent → pass through; check only what's advertised.
  • Non-numeric / out-of-byte-range value → log a warning and pass through. The existing deserialization-time exception remains the safety net for the truly incompatible case, so there is no regression for old peers.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Jingyan Li and others added 5 commits April 30, 2026 16:23
The previous commit added a fast-fail check on the metadata response, but
the server's response order is files first, metadata last — so a client
that catches the mismatch at the metadata stage has already paid for the
entire file transfer. This commit moves the primary check to the request
side, modeled on the existing snapshot-table-format check next to it.

Client `prepareRequest` now stamps the two schema-version headers on the
GET. Server `channelRead0` calls a new
`BlobTransferUtils.compareRequestedSchemaVersionsAgainstLocal(...)` right
after the table-format validation. On mismatch it returns 400 BAD_REQUEST
with an `X-Blob-Transfer-Schema-Mismatch: true` marker header and echoes
its own protocol versions in the response, BEFORE any file work begins.

Client `P2PFileTransferClientHandler.channelRead0` recognizes the marker
on a non-OK response and throws the typed
`VeniceBlobTransferIncompatibleSchemaException` with full peer-vs-local
context (peer = server's local versions echoed in the rejection).

Backward compat preserved: a request without the new headers (older
client) is not rejected — the server falls through to the existing flow.
The response-side check from the prior commit stays as the safety net for
the inverse case (older server that does not yet validate requests will
still send full metadata; new client catches at metadata-parse time
instead of waiting for the receive timeout).

Tests cover: server rejects mismatched request with 400 + marker + no
file responses; server doesn't reject requests without the headers
(backward compat); client builds the typed exception from the rejection
response with correct peer/local versions populated. Existing integration
tests in TestNettyP2PBlobTransferManager (real client↔server flow) all
pass unchanged because the new client always advertises versions that
match the new server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[blob-transfer] add fast-fail schema version check on metadata response

The P2P blob-transfer client deserialized the peer's PartitionState bytes
inside P2PMetadataTransferHandler without a SchemaReader, so any peer that
serialized PartitionState with a protocol version higher than this binary
knows triggered VeniceMessageException at the Netty pipeline tail after
the body had been fully transferred. The transfer future was never failed
explicitly, so the replica had to wait for blobReceiveTimeoutInMin before
falling back to Kafka bootstrap. We saw this on ltx1-app12860.stg with
PartitionState v21 during the rolling deploy that introduced PR linkedin#2707.

Server now stamps two new headers on the metadata response:
  X-Blob-Transfer-Partition-State-Schema-Version
  X-Blob-Transfer-Store-Version-State-Schema-Version
each carrying the local AvroProtocolDefinition.X.getCurrentProtocolVersion().

Client compares peer's value to its own current version at HTTP header-parse
time, before any body is consumed, and throws the new
VeniceBlobTransferIncompatibleSchemaException on mismatch. Throwing from
channelRead0 flows through the existing exceptionCaught ->
completeExceptionally
-> ctx.close() path, so the per-host transfer future fails immediately and
the orchestrator can pick the next peer (or fall back to Kafka) without
waiting on the receive timeout.

Policy is exact equality. Blob transfer is the fast path; if binaries are
not in lock-step we'd rather skip P2P and let Kafka handle bootstrap than
rely on cross-version metadata promotion. Skew is bounded to rolling-deploy
windows.

Backward-compat is preserved during rollout: a missing header passes through
(peer not yet upgraded), and a malformed/out-of-range header logs a warning
and passes through (parse bug must not crash the channel). The existing
deserialize-time exception in InternalAvroSpecificSerializer remains as the
safety net for the truly incompatible case, so no regression vs. today.

Tests cover: header stamped on metadata response only (not on file responses),
known-version pass-through, mismatched PartitionState fails fast, mismatched
StoreVersionState fails fast, single-header-missing pass-through, and
malformed/out-of-range pass-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VeniceBlobTransferIncompatibleSchemaException

Covers the constructor, all getters, message formatting, and both
branches of render() (known version and VERSION_UNKNOWN sentinel).
Bumps venice-client-common diff branch coverage from 91.6% to 92.74%.
testIsVeniceException compared a VeniceBlobTransferIncompatibleSchemaException
against VeniceException via instanceof. Since the class extends
VeniceException,
the relationship is enforced at compile time and SpotBugs flags the runtime
check as vacuous (BC_VACUOUS_INSTANCEOF). The remaining three tests still
cover the constructor, all getters, both branches of render(), and the
VERSION_UNKNOWN sentinel.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an early “protocol/schema version negotiation” for P2P blob-transfer partition metadata so peers can fail fast on PartitionState / StoreVersionState Avro protocol-version mismatches before any snapshot files are transferred, enabling quicker fallback to another peer or Kafka bootstrap during rolling deploy skew.

Changes:

  • Added request/response headers to advertise PartitionState and StoreVersionState protocol versions, plus a server-side early rejection path with a marker header.
  • Added a typed VeniceBlobTransferIncompatibleSchemaException carrying peer/local versions for both protocols, thrown on both the metadata fast-fail path and the server 400 mismatch path.
  • Added/extended unit tests for server and client handlers to cover mismatch, match, and backward-compatibility behaviors (missing/malformed headers).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/venice-client-common/src/main/java/com/linkedin/venice/exceptions/VeniceBlobTransferIncompatibleSchemaException.java Introduces typed exception carrying peer/local protocol versions and a diagnostic message.
internal/venice-client-common/src/test/java/com/linkedin/venice/exceptions/VeniceBlobTransferIncompatibleSchemaExceptionTest.java Unit tests for exception fields and message rendering (including <unknown> sentinel).
clients/da-vinci-client/src/main/java/com/linkedin/davinci/blobtransfer/BlobTransferUtils.java Defines new blob-transfer version/mismatch headers and provides client/server validation helpers.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/blobtransfer/server/P2PFileTransferServerHandler.java Adds server-side pre-file-work schema version gating and echoes local versions on 400 mismatch.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/blobtransfer/client/P2PFileTransferClientHandler.java Adds client-side fast-fail on metadata header mismatch and typed exception handling on 400 mismatch.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/blobtransfer/client/NettyFileTransferClient.java Advertises local protocol versions on the outgoing GET request headers.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/blobtransfer/TestP2PFileTransferServerHandler.java Adds server tests for mismatch rejection, matching versions success, and missing-headers pass-through.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/blobtransfer/TestP2PFileTransferClientHandler.java Adds client tests for metadata mismatch fast-fail, old-peer missing headers, and 400 mismatch typed exception.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sendSchemaMismatchResponse: use StandardCharsets.UTF_8 for the body
bytes and set Content-Type: text/plain so the diagnostic response no
longer depends on the platform default charset and matches the
NettyUtils.setupResponseAndFlush convention used by sibling error
paths in this handler.

parseProtocolVersionHeader: lower the malformed/out-of-range WARN
logs to DEBUG. The header values are peer-controlled and could be
hit on every request, so a misbehaving peer would otherwise produce
unbounded WARN spam. VERSION_UNKNOWN already routes through the
caller's pass-through path, the actual mismatch case is logged at
WARN by the caller with full peer-host context, and malformed values
that slip through still surface via the existing deserialization-time
exception.
Jingyan Li added 6 commits May 28, 2026 12:48
The server-side gate already rejects a mismatched request with a 400
+ BLOB_TRANSFER_SCHEMA_MISMATCH marker before any file work begins.
By the time the client receives a metadata response, the versions
are known to match (NEW server) or were never advertised (OLD server
didn't stamp headers either), so the client-side check at
metadata-parse time was always dead.

Remove the client-side validateMetadataResponseSchemaVersions call,
delete the now-unused BlobTransferUtils method, stop stamping the
version headers on the server's metadata response, and drop the
tests that exercised the dead path. Update the constants' javadoc to
reflect that they now only appear on the client request and the
server's 400 echo.

The live fail-fast path is unchanged: client advertises versions on
the request, server gate rejects mismatched requests with 400 +
marker + echoed versions, client builds the typed
VeniceBlobTransferIncompatibleSchemaException from those echoed
versions.
…tion

Replace the hand-rolled sendSchemaMismatchResponse helper with a direct
NettyUtils.setupResponseAndFlush call. Switch the rejection status from
400 BAD_REQUEST + custom marker header to 412 PRECONDITION_FAILED, which
is semantically a precondition (the requester's advertised schema
versions don't satisfy the server's required versions) and lets the
client differentiate purely on status without parsing a custom header.

Drop now-redundant machinery:
- BLOB_TRANSFER_SCHEMA_MISMATCH marker header constant.
- Server-side echoing of local schema versions on the rejection
  response (the body string already carries the same diagnostic).
- Client-side buildSchemaMismatchExceptionFromErrorResponse and
  readVersionHeader helpers; the client now branches on status code.
- Per-protocol peer/local version fields and accessors on
  VeniceBlobTransferIncompatibleSchemaException; the exception is now
  (peerHost, message) with the diagnostic body as the message.
- VERSION_UNKNOWN moved from VeniceBlobTransferIncompatibleSchemaException
  to BlobTransferUtils as a private sentinel — it was only used by the
  pass-through path in parseProtocolVersionHeader.

The live fast-fail path is unchanged: client advertises versions on the
request, server compares against local and returns 412 with a
diagnostic body before any file work, client throws the typed exception
on 412 so the orchestrator can fall over to the next peer or to Kafka
bootstrap.
handlePeerFetchException currently cleans up both the partition
directory and the temp transferred dir in its default branch. For a
schema-version mismatch the server rejects with 412 PRECONDITION_FAILED
before any file work begins, so there is nothing on disk to clean up
from this attempt — and wiping the dir would also clobber whatever the
caller had there for unrelated reasons.

Add a VeniceBlobTransferIncompatibleSchemaException branch that logs
and returns without touching the partition dir, so the chain-of-peers
walker can advance to the next peer at the speed of the 412 round-trip.

Cover both the new path and the contrast (generic exception still
triggers the cleanup) in TestNettyP2PBlobTransferManager.
The previous commit relaxed handlePeerFetchException to package-private
just so a test could invoke it directly. That bleeds a test convenience
into the production API. Restore private and rewrite the test to drive
the schema-mismatch branch through the public manager.get(...) surface
by stubbing the NettyFileTransferClient spy to return a future that
completes with VeniceBlobTransferIncompatibleSchemaException — the same
pattern testFailedConnectPeer and siblings already use for the other
branches. The post-condition (canary file in the partition dir survives
the failed attempt) stays the same.
VeniceBlobTransferIncompatibleSchemaException extends VeniceException,
so the runtime instanceof check is provably true at compile time and
SpotBugs flags it (BC_VACUOUS_INSTANCEOF, plus a duplicate detector
hit). This is the same issue prior commit eb767b6 already removed
once; my exception-test rewrite re-introduced it. Drop the test —
the inheritance is enforced by the compiler, no runtime assertion adds
coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants