[server] Introduce record level delay with heartbeat delay #2399

misyel · 2026-01-23T23:27:20Z

Problem Statement

Heartbeats are sent every 60 seconds and it will not always represent the replication lag that the system is operating at. This PR introduces record level delay tracking alongside heartbeat delay.

Solution

This PR introduces record-level timestamp tracking for Venice's ingestion pipeline, providing more granular visibility into data freshness and ingestion delays. The implementation enhances the existing heartbeat monitoring system to track timestamps for individual data records in addition to periodic heartbeat control messages.

Key Features

Record-Level Timestamp Tracking: Track timestamps for every individual data record processed during ingestion
OpenTelemetry Integration: Emit per-record OTel metrics for real-time monitoring of ingestion delays
Configurable Feature Flags: Two new config keys to control the feature:
SERVER_RECORD_LEVEL_TIMESTAMP_ENABLED: Enable record-level timestamp tracking
SERVER_PER_RECORD_OTEL_METRICS_ENABLED: Enable per-record OTel metrics emission

Implementation Details

Data Model Enhancement: Renamed HeartbeatTimeStampEntry → IngestionTimestampEntry to track both heartbeat and record timestamps
Monitoring Service Updates: Extended HeartbeatMonitoringService with new methods for record-level timestamp tracking
OTel Metrics: Added RecordOtelStats class for OpenTelemetry metrics with dimensions for region, version role, replica type, and replica state
Ingestion Pipeline Integration: Added hooks in LeaderFollowerStoreIngestionTask to record timestamps during record processing

Suggested Review Order
Configuration and Data Model (Start here for context)

ConfigKeys.java: New configuration flags
IngestionTimestampEntry.java: Enhanced data model for both heartbeat and record timestamps

Core Implementation

HeartbeatMonitoringService.java: Core implementation of record-level timestamp tracking
RecordOtelStats.java: New class for OpenTelemetry metrics
HeartbeatVersionedStats.java: Integration with metrics system

Integration with Ingestion Pipeline

LeaderFollowerStoreIngestionTask.java: Hook for recording timestamps during processing
StoreIngestionTask.java: Early exit checks and optimizations

Tests

HeartbeatMonitoringServiceTest.java: Tests for record-level timestamp tracking
RecordOtelStatsTest.java: Tests for OTel metrics
HeartbeatVersionedStatsTest.java: Tests for metrics integration

Configuration Integration

VeniceServerConfig.java: Exposing new configuration options

Code changes

Added new code behind a config. If so list the config names and their default values in the PR description.
Introduced new log lines.
- Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

Code has no race conditions or thread safety issues.
Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

New unit tests added.
New integration tests added.
Modified or extended existing tests.
Verified backward compatibility (if applicable).
Unit tests for both heartbeat and record-level timestamp tracking
Tests for OTel metrics emission
Tests for different configuration combinations

Does this PR introduce any user-facing or breaking changes?

No. You can skip the rest of this section.
Yes. Clearly explain the behavior change and its impact.

ZacAttack · 2026-01-25T02:51:12Z

Why is this important? This seems like an awfully big expense for dubious observability gain.

misyel · 2026-02-02T23:54:31Z

Why is this important? This seems like an awfully big expense for dubious observability gain.

Hi Zac - we want to measure the true end to end replication latency and enforce a sla for it. Heartbeats are only emitted every 60s and this metric may not be an accurate representation if we do not process a heartbeat for a few mins, but are actually continuing to process records between the heartbeats

clients/da-vinci-client/src/main/java/com/linkedin/davinci/config/VeniceServerConfig.java

...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java

sushantmane · 2026-02-09T21:40:56Z

...nt/src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/IngestionTimestampEntry.java

+  /**
+   * Whether this timestamp entry was consumed from input or if the system initialized it as a default entry
+   */
+  public final boolean consumedFromUpstream;


This seems unnecessary to me. I know this is existing code but why can't we use sentinel values for timestamp to achieve the same goal?

I agree, it is possible to achieve the same goal by using sentinel values for the timestamp but we will have to update existing code to account for this sentinel value and I feel that it affects readability in the long term. Since we also made a change to reuse the same IngestionTimestampEntry, the extra space is uses shouldn't be too bad

...src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatMonitoringService.java

sushantmane

Thanks a ton, @misyel for the changes. Your changes LGTM. But the current state of HeartbeatMonitoringService class is not great. I've left some comments to improve it. Let me know what do you think!

...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java

...src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatMonitoringService.java

...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java

sushantmane · 2026-02-10T07:28:25Z

...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java

            leaderProducedRecordContext,
            currentTimeMs);
+        // Record regular record timestamp for heartbeat monitoring if enabled
+        if (recordLevelTimestampEnabled) {


Just throwing it out: should we fold this into if (recordLevelMetricEnabled.get()) {?

...src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatMonitoringService.java

sushantmane · 2026-02-11T19:25:41Z

...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java

+   * is acceptable. The cached keys are then reused on the per-record hot path to avoid repeated allocation
+   * and hash computation.
+   */
+  private void refreshCachedHeartbeatKeys(PartitionConsumptionState pcs) {


Why is this required? We are not embedding role of the replica in HB key, right?

...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java

...inci-client/src/main/java/com/linkedin/davinci/kafka/consumer/PartitionConsumptionState.java

sushantmane · 2026-02-12T23:35:11Z

...src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatMonitoringService.java

-            result.put(replicaId + "-" + region.getKey(), replicaHeartbeatInfo);
-          }
-        }
+    for (Map.Entry<HeartbeatKey, IngestionTimestampEntry> entry: heartbeatTimestampMap.entrySet()) {


This can be optimized IMO. We can all the PCS's for given version topic from SIT and then only get the entries for keys inside those PCS entries

sushantmane · 2026-02-12T23:37:38Z

...src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatMonitoringService.java

     * should be able to tell us all the lag information.
     */
-    for (Map.Entry<String, HeartbeatTimeStampEntry> entry: replicaTimestampMap.entrySet()) {
+    for (Map.Entry<HeartbeatKey, IngestionTimestampEntry> entry: getLeaderHeartbeatTimeStamps().entrySet()) {


Why do we need to iterate through all of the keys? We already have handle to partitionConsumptionState. We can just iterate through subset of keys (3)

Could you please check other places where we are iterating through the whole map. We should not do that. Almost all lookups in this class should happen with PCS as an arg

introduce record level delay metric

a1b3030

misyel added 3 commits February 2, 2026 15:42

Add per record calls with new server config

aa0d075

Move config check to SIT

95870eb

Merge branch 'main' into record-timestamp

0e8343d

fix spotbugs

7f94c1b

misyel marked this pull request as ready for review February 3, 2026 22:10

misyel changed the title ~~[WIP][server] Introduce record level delay with heartbeat delay~~ [server] Introduce record level delay with heartbeat delay Feb 3, 2026

sushantmane reviewed Feb 9, 2026

View reviewed changes

sushantmane reviewed Feb 10, 2026

View reviewed changes

misyel added 5 commits February 10, 2026 18:25

add heartbeat key, address other review comments

ec36be5

fix comments

26157c6

Cache hb key on first use, fix follower initilization behavior

8699969

use computeIfAbsent

b35fef2

remove comments, use hashmap instead

cd13e43

sushantmane reviewed Feb 12, 2026

View reviewed changes

Use PCS in HBS, move create key to PCS

90989de

[server] Introduce record level delay with heartbeat delay #2399

Are you sure you want to change the base?

[server] Introduce record level delay with heartbeat delay #2399

Uh oh!

Conversation

misyel commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Solution

Code changes

Concurrency-Specific Checks

How was this PR tested?

Does this PR introduce any user-facing or breaking changes?

Uh oh!

ZacAttack commented Jan 25, 2026

Uh oh!

misyel commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sushantmane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

misyel commented Jan 23, 2026 •

edited

Loading

misyel commented Feb 2, 2026 •

edited

Loading

sushantmane left a comment •

edited

Loading