[admin-tool][controller] Fix update-admin-operation-protocol-version silently ignoring -1#2837
Open
minhmo1620 wants to merge 5 commits into
Open
Conversation
…oring -1 Setting --admin-operation-protocol-version -1 via the admin tool was silently dropped because -1L (UNDEFINED_VALUE) was used both as the "no-change" sentinel in the AdminMetadata merge logic and as a valid target value (meaning "unpinned, use latest supported version"). In ZkAdminTopicMetadataAccessor.updateV2Metadata and its in-memory counterpart, the merge guard `!getAdminOperationProtocolVersion().equals(UNDEFINED_VALUE)` evaluated to false when the caller explicitly set the value to -1, causing the ZK write to be skipped entirely while returning HTTP 200. Fix: add AdminMetadata.hasAdminOperationProtocolVersion() which checks the raw nullable Long field directly, distinguishing "not provided in this delta" (null) from "explicitly set to -1". Both accessor implementations now use this method instead of comparing the getter return value to UNDEFINED_VALUE.
…ation version to -1 Covers the regression where --admin-operation-protocol-version -1 was silently dropped. The test pins the version to 80, then resets to -1 via AdminTool.main, and asserts the ZK-backed metadata reflects -1 (not the stale pinned value).
There was a problem hiding this comment.
Pull request overview
Fixes a controller/admin-tool bug where explicitly setting adminOperationProtocolVersion = -1 (“unpinned / use latest”) was treated as “unset” during metadata merge, resulting in no ZooKeeper update despite returning HTTP 200.
Changes:
- Added
AdminMetadata.hasAdminOperationProtocolVersion()to distinguish “field not provided” (null) from “explicitly set to -1”. - Updated metadata merge logic in both ZK-backed and in-memory accessors to use
hasAdminOperationProtocolVersion()instead of comparing againstUNDEFINED_VALUE. - Added unit tests to verify updating protocol version to
-1is accepted and persisted.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| services/venice-controller/src/test/java/com/linkedin/venice/controller/TestZkAdminTopicMetadataAccessor.java | Adds a test ensuring a delta with adminOperationProtocolVersion = -1 overwrites an existing pinned value in ZK. |
| services/venice-controller/src/test/java/com/linkedin/venice/controller/server/ClusterAdminOpsRequestHandlerTest.java | Adds a test ensuring the handler accepts and returns -1. |
| services/venice-controller/src/main/java/com/linkedin/venice/controller/ZkAdminTopicMetadataAccessor.java | Fixes merge guard to apply -1 updates by checking raw-field presence. |
| services/venice-controller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminMetadata.java | Introduces hasAdminOperationProtocolVersion() for presence detection. |
| internal/venice-test-common/src/main/java/com/linkedin/venice/admin/InMemoryAdminTopicMetadataAccessor.java | Updates merge guard for admin protocol version to use presence detection (but see review comment about executionId merge guard). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
20
to
27
| PubSubPosition newPosition = metadataDelta.getPosition(); | ||
| PubSubPosition newUpstreamPosition = metadataDelta.getUpstreamPosition(); | ||
| if (metadataDelta.getExecutionId() != null) { | ||
| inMemoryMetadata.setExecutionId(metadataDelta.getExecutionId()); | ||
| } | ||
| if (!metadataDelta.getAdminOperationProtocolVersion().equals(UNDEFINED_VALUE)) { | ||
| if (metadataDelta.hasAdminOperationProtocolVersion()) { | ||
| inMemoryMetadata.setAdminOperationProtocolVersion(metadataDelta.getAdminOperationProtocolVersion()); | ||
| } |
Contributor
Author
There was a problem hiding this comment.
Good catch. Added AdminMetadata.hasExecutionId() (mirrors hasAdminOperationProtocolVersion()) and updated both InMemoryAdminTopicMetadataAccessor and ZkAdminTopicMetadataAccessor to use it, fixing the same class of bug for executionId.
added 2 commits
May 28, 2026 17:43
…istently InMemoryAdminTopicMetadataAccessor used `getExecutionId() != null` which is always true since the getter returns UNDEFINED_VALUE (-1L) instead of null. ZkAdminTopicMetadataAccessor used `!getExecutionId().equals(UNDEFINED_VALUE)` which cannot distinguish a missing field from an explicit -1. Add AdminMetadata.hasExecutionId() mirroring hasAdminOperationProtocolVersion(), and update both accessors to use it. Addresses Copilot review comment.
…e guard - TestAdminMetadata.testHasExecutionIdAndHasAdminOperationProtocolVersion: verifies has*() returns false when field is absent, true when set (including when explicitly set to UNDEFINED_VALUE, distinguishing it from absent). - TestZkAdminTopicMetadataAccessor.testDeltaWithoutExecutionIdPreservesExistingV alue: verifies a delta that omits executionId does not overwrite an existing executionId in ZK (the bug fixed by switching to hasExecutionId()).
executionId = -1 (UNDEFINED_VALUE) is never a valid execution ID and must
never be written to ZK. Revert the executionId merge guards in both
ZkAdminTopicMetadataAccessor and InMemoryAdminTopicMetadataAccessor to use
!getExecutionId().equals(UNDEFINED_VALUE), which blocks -1 regardless of
whether it came from an absent field (null) or an explicit -1 in the delta.
The hasExecutionId() method is kept but the merge guards do not use it for
executionId, since the semantics differ from adminOperationProtocolVersion
where -1 is a valid target value ("unpinned / use latest").
Update testHasExecutionIdAndHasAdminOperationProtocolVersion to document the
distinction, and expand testExecutionIdIsNeverOverwrittenWithUndefinedValue
to cover both the absent-field and explicit-(-1) cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
--update-admin-operation-protocol-version --admin-operation-protocol-version -1via the admin tool was silently dropped. The command returned HTTP 200 but the ZooKeeper node was never updated.Root cause:
-1L(UNDEFINED_VALUE) served two conflicting roles in theAdminMetadatamerge logic:ControllerRequestParamValidator).ZkAdminTopicMetadataAccessorandInMemoryAdminTopicMetadataAccessor.The same structural bug also affected
executionId:InMemoryAdminTopicMetadataAccessorguarded withgetExecutionId() != null, but the getter never returnsnull(it maps a missing field to-1L), so the guard was alwaystrueand could unintentionally overwrite a storedexecutionId.ZkAdminTopicMetadataAccessorused!getExecutionId().equals(UNDEFINED_VALUE)which had the symmetric problem — it could not distinguish an absent field from one explicitly set to-1.Solution
Added
AdminMetadata.hasAdminOperationProtocolVersion()andAdminMetadata.hasExecutionId(), which check the raw nullableLongfields directly (field != null). This cleanly distinguishes "not provided in this delta" (null) from "explicitly set to-1".All three merge-guard sites now use these presence-check methods:
ZkAdminTopicMetadataAccessor— bothexecutionIdandadminOperationProtocolVersionInMemoryAdminTopicMetadataAccessor— both fieldsCode changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Unit tests:
TestAdminMetadata.testHasExecutionIdAndHasAdminOperationProtocolVersion— verifieshas*()returnsfalsewhen field is absent andtruewhen set, including when explicitly set to-1.TestZkAdminTopicMetadataAccessor.testUpdateAdminOperationProtocolVersionToNegativeOne— verifies a delta withadminOperationProtocolVersion = -1overwrites an existing pinned version in ZK (would have failed before this fix).TestZkAdminTopicMetadataAccessor.testDeltaWithoutExecutionIdPreservesExistingValue— verifies a delta that omitsexecutionIddoes not overwrite an existing value in ZK.ClusterAdminOpsRequestHandlerTest.testUpdateAdminOperationProtocolVersionWithNegativeOne— verifies the handler layer accepts and returns-1.Integration test:
TestAdminToolEndToEnd.testUpdateAdminOperationVersionToNegativeOne— pins the version to80, then resets to-1viaAdminTool.main, and asserts the ZK-backed metadata reflects-1.Does this PR introduce any user-facing or breaking changes?