Skip to content

[controller][schema] Coerce legacy numeric defaults during store migration#2802

Merged
xunyin8 merged 7 commits into
linkedin:mainfrom
xunyin8:StoreMigrationRejectedDueToLegacySchema
Jun 4, 2026
Merged

[controller][schema] Coerce legacy numeric defaults during store migration#2802
xunyin8 merged 7 commits into
linkedin:mainfrom
xunyin8:StoreMigrationRejectedDueToLegacySchema

Conversation

@xunyin8

@xunyin8 xunyin8 commented May 16, 2026

Copy link
Copy Markdown
Contributor

Problem Statement

Store migration fails due to strict default numeric value check. Legacy stores did not enforce this check and now they cannot be migrated.

Solution

Adds a destination-side rewrite ({0 -> 0.0} on float-typed fields, etc.) so legacy schemas registered before validateNumericDefaultValueTypes was enforced can be migrated into clusters where the controller's STRICT parser now rejects them. Gated on storeConfig.migrationDestCluster, so non-migration writes are unaffected; defensively re-strict-parses the output so non-numeric violations (bad names, dangling content, union default not first branch) still fail loudly.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

@xunyin8 xunyin8 force-pushed the StoreMigrationRejectedDueToLegacySchema branch from fc5908f to 9c88232 Compare May 20, 2026 18:51
@xunyin8 xunyin8 requested a review from namithanivead May 20, 2026 20:08
namithanivead
namithanivead previously approved these changes May 27, 2026
@xunyin8 xunyin8 requested a review from namithanivead June 4, 2026 05:28
@xunyin8 xunyin8 enabled auto-merge (squash) June 4, 2026 05:29
xunyin8 and others added 6 commits June 3, 2026 22:33
…ation

Adds a destination-side rewrite ({0 -> 0.0} on float-typed fields, etc.)
so legacy schemas registered before validateNumericDefaultValueTypes was
enforced can be migrated into clusters where the controller's STRICT
parser now rejects them. Gated on storeConfig.migrationDestCluster, so
non-migration writes are unaffected; defensively re-strict-parses the
output so non-numeric violations (bad names, dangling content, union
default not first branch) still fail loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing tests for parseSchemaFromJSONLooseNumericValidation and
coerceNumericDefaultsToFieldType live in venice-push-job; diff coverage
on venice-client-common (where the class lives) only sees branches
exercised by tests in the same module. Adds 16 focused tests covering
the strict/loose/loose-numeric parsers, the parseSchemaFromJSON wrapper
with both extendedSchemaValidityCheckEnabled values, and every branch
of the JSON walker (each numeric tier, nested record recursion,
non-textual type passthrough, identity short-circuit for clean input,
IOException → VeniceException wrap).

Diff coverage on the changed lines: 76.92% branches (45% required).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assert.assertSame on String operands compiles to ==, which trips
ES_COMPARING_STRINGS_WITH_EQ. Switch to Assert.assertEquals — the
observable behavior (walker returns the input unchanged for clean or
non-textual-type schemas) is still verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Jackson parses any JSON decimal literal (e.g. 0.0) into a DoubleNode
regardless of the declared field type. The "float" branch of
coerceNumber short-circuits on value.isDouble(), so legacy schemas
written as {"type":"float","default":0.0} are NOT rewritten by the
walker — they pass through unchanged. Empirically avro-util1's STRICT
parser accepts DoubleNode-on-float (the numeric-tier check is
asymmetric: rejects IntNode-on-float, accepts DoubleNode-on-float), so
the output is still strict-clean.

Adds a regression-pinning test for this combination. If avro-util1 ever
tightens the float numeric-tier check to be symmetric, this test will
fail and the "float" branch will need to coerce DoubleNode -> FloatNode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the initial strict parse fails inside normalizeSchemaForMigration,
log strictFailure at INFO before attempting the LOOSE_NUMERICS-based
coercion, and on the post-coercion strict re-check attach strictFailure
as a suppressed exception of whatever the second parse throws.

For non-numeric violations (union default not first branch, bad names,
dangling content) the post-coercion strict parse still fails — and
without chaining, the operator only sees that second exception and has
no idea what was wrong with the source schema. The suppressed entry
puts the original message into the same stack trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xunyin8 xunyin8 force-pushed the StoreMigrationRejectedDueToLegacySchema branch from 9f4261b to 4008fa6 Compare June 4, 2026 05:34
VeniceParentHelixAdmin#createStore and #addValueSchema now route value
schemas through VeniceHelixAdmin#normalizeSchemaForMigration. The internal
admin is a Mockito mock in TestVeniceParentHelixAdmin, so the unstubbed
method returned null and blanked out the value schema, producing NPEs
during admin-message serialization and strict schema parsing.

Mirror production's non-migration passthrough behavior by stubbing the
method to return its schema argument verbatim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xunyin8 xunyin8 merged commit 6b527ed into linkedin:main Jun 4, 2026
113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants