[vpj] Fixing a bug with chunked messages in Repush #2480

ymuppala · 2026-02-12T19:07:14Z

Problem Statement

Spark KIF repush fails for stores with chunking enabled due to two bugs:

Schema validation rejects Venice internal columns: validateDataFrame() rejects all fields starting with _, but chunked KIF DataFrames contain Venice internal columns (__schema_id__, __offset__, __message_type__, __chunked_key_suffix__, __replication_metadata_version_id__) that are required for chunk assembly. These columns are consumed by applyChunkAssembly() and dropped later in runComputeJob(), but validation runs before they are dropped.
NotSerializableException in Spark lambdas: The lambdas in applyChunkAssembly() and applyTTLFilter() reference the instance field accumulatorsForDataWriterJob directly, which captures DataWriterSparkJob. Since
DataWriterSparkJob doesn't implement Serializable, Spark fails when serializing the task to send to executors.
GHA has updated the docker version which in turn is causing the PulsarIntegrationTests to fail.

Solution

In validateDataFrame(), when isSourceKafka is true, the five known Venice internal column names are added to an allowlist and exempted from the underscore check. For regular VPJ unknown
underscore columns are still rejected.
Following the existing pattern in applyDedup(), the accumulators are extracted into final local variables before the lambda, so only the serializable LongAccumulator is captured instead of this.
Pinning the version of docker.

Code changes

Added new code behind a config. If so list the config names and their default values in the PR description.
Introduced new log lines.
- Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

Code has no race conditions or thread safety issues.
Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

The PR was tested against a test instance in airflow rdev http://rdev-aks-wus3-14.privatelink.westus3.azmk8s.io:42229/dags/vpj-spark-repush-test_faro__voldemort-build-and-push/grid?dag_run_id=manual__2026-02-12T18%3A23%3A00.588766%2B00%3A00&task_id=spark_repush_job&tab=logs

Does this PR introduce any user-facing or breaking changes?

No. You can skip the rest of this section.
Yes. Clearly explain the behavior change and its impact.

ymuppala force-pushed the repush_integration_test branch 2 times, most recently from 79eb63a to 87b9351 Compare February 12, 2026 21:20

[vpj] Fixing a bug with chunked message in Repush

71679f7

ymuppala force-pushed the repush_integration_test branch from 87b9351 to 71679f7 Compare February 12, 2026 21:43

ymuppala added 5 commits February 12, 2026 14:18

Bumping container version

de6f42a

Pinning the docker API version for PulsarVeniceIntegration test

d5cedfb

Pinning version of ubuntu

ca71a59

Creating docker-java.properties

75ea077

Adding docker-java.properties

e86c0f0

xunyin8 approved these changes Feb 13, 2026

View reviewed changes

ymuppala merged commit 3c0d285 into linkedin:main Feb 13, 2026
69 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vpj] Fixing a bug with chunked messages in Repush #2480

[vpj] Fixing a bug with chunked messages in Repush #2480

ymuppala commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[vpj] Fixing a bug with chunked messages in Repush #2480

[vpj] Fixing a bug with chunked messages in Repush #2480

Conversation

ymuppala commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Solution

Code changes

Concurrency-Specific Checks

How was this PR tested?

Does this PR introduce any user-facing or breaking changes?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ymuppala commented Feb 12, 2026 •

edited

Loading