Skip to content

[admin-tool][controller] Fix update-admin-operation-protocol-version silently ignoring -1#2837

Open
minhmo1620 wants to merge 5 commits into
linkedin:mainfrom
minhmo1620:minhmo1620/fix-admin-protocol-version-negative-one
Open

[admin-tool][controller] Fix update-admin-operation-protocol-version silently ignoring -1#2837
minhmo1620 wants to merge 5 commits into
linkedin:mainfrom
minhmo1620:minhmo1620/fix-admin-protocol-version-negative-one

Conversation

@minhmo1620

@minhmo1620 minhmo1620 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Problem Statement

--update-admin-operation-protocol-version --admin-operation-protocol-version -1 via the admin tool was silently dropped. The command returned HTTP 200 but the ZooKeeper node was never updated.

Root cause: -1L (UNDEFINED_VALUE) served two conflicting roles in the AdminMetadata merge logic:

  1. A valid target value meaning "unpinned — use the latest protocol version supported by the running code" (allowed by ControllerRequestParamValidator).
  2. The "this field was not set in this delta" sentinel used by the merge guards in ZkAdminTopicMetadataAccessor and InMemoryAdminTopicMetadataAccessor.

The same structural bug also affected executionId: InMemoryAdminTopicMetadataAccessor guarded with getExecutionId() != null, but the getter never returns null (it maps a missing field to -1L), so the guard was always true and could unintentionally overwrite a stored executionId. ZkAdminTopicMetadataAccessor used !getExecutionId().equals(UNDEFINED_VALUE) which had the symmetric problem — it could not distinguish an absent field from one explicitly set to -1.

Solution

Added AdminMetadata.hasAdminOperationProtocolVersion() and AdminMetadata.hasExecutionId(), which check the raw nullable Long fields directly (field != null). This cleanly distinguishes "not provided in this delta" (null) from "explicitly set to -1".

All three merge-guard sites now use these presence-check methods:

  • ZkAdminTopicMetadataAccessor — both executionId and adminOperationProtocolVersion
  • InMemoryAdminTopicMetadataAccessor — both fields

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Unit tests:

  • TestAdminMetadata.testHasExecutionIdAndHasAdminOperationProtocolVersion — verifies has*() returns false when field is absent and true when set, including when explicitly set to -1.
  • TestZkAdminTopicMetadataAccessor.testUpdateAdminOperationProtocolVersionToNegativeOne — verifies a delta with adminOperationProtocolVersion = -1 overwrites an existing pinned version in ZK (would have failed before this fix).
  • TestZkAdminTopicMetadataAccessor.testDeltaWithoutExecutionIdPreservesExistingValue — verifies a delta that omits executionId does not overwrite an existing value in ZK.
  • ClusterAdminOpsRequestHandlerTest.testUpdateAdminOperationProtocolVersionWithNegativeOne — verifies the handler layer accepts and returns -1.

Integration test:

  • TestAdminToolEndToEnd.testUpdateAdminOperationVersionToNegativeOne — pins the version to 80, then resets to -1 via AdminTool.main, and asserts the ZK-backed metadata reflects -1.

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.

…oring -1

Setting --admin-operation-protocol-version -1 via the admin tool was silently
dropped because -1L (UNDEFINED_VALUE) was used both as the "no-change"
sentinel
in the AdminMetadata merge logic and as a valid target value (meaning
"unpinned,
use latest supported version").

In ZkAdminTopicMetadataAccessor.updateV2Metadata and its in-memory
counterpart,
the merge guard `!getAdminOperationProtocolVersion().equals(UNDEFINED_VALUE)`
evaluated to false when the caller explicitly set the value to -1, causing the
ZK write to be skipped entirely while returning HTTP 200.

Fix: add AdminMetadata.hasAdminOperationProtocolVersion() which checks the raw
nullable Long field directly, distinguishing "not provided in this delta"
(null)
from "explicitly set to -1". Both accessor implementations now use this method
instead of comparing the getter return value to UNDEFINED_VALUE.
Copilot AI review requested due to automatic review settings May 29, 2026 00:37
…ation

version to -1

Covers the regression where --admin-operation-protocol-version -1 was silently
dropped. The test pins the version to 80, then resets to -1 via
AdminTool.main,
and asserts the ZK-backed metadata reflects -1 (not the stale pinned value).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a controller/admin-tool bug where explicitly setting adminOperationProtocolVersion = -1 (“unpinned / use latest”) was treated as “unset” during metadata merge, resulting in no ZooKeeper update despite returning HTTP 200.

Changes:

  • Added AdminMetadata.hasAdminOperationProtocolVersion() to distinguish “field not provided” (null) from “explicitly set to -1”.
  • Updated metadata merge logic in both ZK-backed and in-memory accessors to use hasAdminOperationProtocolVersion() instead of comparing against UNDEFINED_VALUE.
  • Added unit tests to verify updating protocol version to -1 is accepted and persisted.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/venice-controller/src/test/java/com/linkedin/venice/controller/TestZkAdminTopicMetadataAccessor.java Adds a test ensuring a delta with adminOperationProtocolVersion = -1 overwrites an existing pinned value in ZK.
services/venice-controller/src/test/java/com/linkedin/venice/controller/server/ClusterAdminOpsRequestHandlerTest.java Adds a test ensuring the handler accepts and returns -1.
services/venice-controller/src/main/java/com/linkedin/venice/controller/ZkAdminTopicMetadataAccessor.java Fixes merge guard to apply -1 updates by checking raw-field presence.
services/venice-controller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminMetadata.java Introduces hasAdminOperationProtocolVersion() for presence detection.
internal/venice-test-common/src/main/java/com/linkedin/venice/admin/InMemoryAdminTopicMetadataAccessor.java Updates merge guard for admin protocol version to use presence detection (but see review comment about executionId merge guard).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 20 to 27
PubSubPosition newPosition = metadataDelta.getPosition();
PubSubPosition newUpstreamPosition = metadataDelta.getUpstreamPosition();
if (metadataDelta.getExecutionId() != null) {
inMemoryMetadata.setExecutionId(metadataDelta.getExecutionId());
}
if (!metadataDelta.getAdminOperationProtocolVersion().equals(UNDEFINED_VALUE)) {
if (metadataDelta.hasAdminOperationProtocolVersion()) {
inMemoryMetadata.setAdminOperationProtocolVersion(metadataDelta.getAdminOperationProtocolVersion());
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added AdminMetadata.hasExecutionId() (mirrors hasAdminOperationProtocolVersion()) and updated both InMemoryAdminTopicMetadataAccessor and ZkAdminTopicMetadataAccessor to use it, fixing the same class of bug for executionId.

Minh Nguyen added 2 commits May 28, 2026 17:43
…istently

InMemoryAdminTopicMetadataAccessor used `getExecutionId() != null` which is
always true since the getter returns UNDEFINED_VALUE (-1L) instead of null.
ZkAdminTopicMetadataAccessor used `!getExecutionId().equals(UNDEFINED_VALUE)`
which cannot distinguish a missing field from an explicit -1.

Add AdminMetadata.hasExecutionId() mirroring
hasAdminOperationProtocolVersion(),
and update both accessors to use it. Addresses Copilot review comment.
…e guard

- TestAdminMetadata.testHasExecutionIdAndHasAdminOperationProtocolVersion:
  verifies has*() returns false when field is absent, true when set (including
  when explicitly set to UNDEFINED_VALUE, distinguishing it from absent).
-
TestZkAdminTopicMetadataAccessor.testDeltaWithoutExecutionIdPreservesExistingV
alue:
  verifies a delta that omits executionId does not overwrite an existing
  executionId in ZK (the bug fixed by switching to hasExecutionId()).
Copilot AI review requested due to automatic review settings May 29, 2026 00:45

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

executionId = -1 (UNDEFINED_VALUE) is never a valid execution ID and must
never be written to ZK. Revert the executionId merge guards in both
ZkAdminTopicMetadataAccessor and InMemoryAdminTopicMetadataAccessor to use
!getExecutionId().equals(UNDEFINED_VALUE), which blocks -1 regardless of
whether it came from an absent field (null) or an explicit -1 in the delta.

The hasExecutionId() method is kept but the merge guards do not use it for
executionId, since the semantics differ from adminOperationProtocolVersion
where -1 is a valid target value ("unpinned / use latest").

Update testHasExecutionIdAndHasAdminOperationProtocolVersion to document the
distinction, and expand testExecutionIdIsNeverOverwrittenWithUndefinedValue
to cover both the absent-field and explicit-(-1) cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants