Skip to content

[Bug] LookupMergeFunction.pickHighLevel() ignores sequence.field when selecting high level record #7220

@Liulietong

Description

@Liulietong

Search before asking

  • I searched in the issues and found no similar issues.

Paimon version

master (latest)

Compute Engine

None

Minimal reproduce step

When using changelog-producer = lookup with sequence.field configured, LookupMergeFunction.pickHighLevel() may select the wrong "old" record when out-of-order data arrives.

Configuration:

CREATE TABLE test (
    id INT PRIMARY KEY NOT ENFORCED,
    value INT,
    update_time BIGINT
) WITH (
    'changelog-producer' = 'lookup',
    'sequence.field' = 'update_time'
);

Scenario:

Initial state after compaction:
  L1: (id=1, value=100, update_time=7)
  L2: (id=1, value=200, update_time=8)  ← Actually newer!

New out-of-order data arrives at L0:
  L0: (id=1, value=50, update_time=6)   ← Old data arriving late

Expected behavior:

  • pickHighLevel() should select L2 (update_time=8) as the "latest" high-level record
  • Result should reflect the record with highest sequence value

Actual behavior:

  • pickHighLevel() selects L1 (level 1 < level 2) ignoring sequence.field
  • Wrong changelog is generated

What doesn't meet your expectations?

LookupMergeFunction.pickHighLevel() only compares level numbers, ignoring sequence.field:

// LookupMergeFunction.java:88 - Current behavior
if (highLevel == null || kv.level() < highLevel.level()) {
    highLevel = kv;  // Always picks lowest level, ignores sequence
}

Reproducible scenario:

// When candidates contain:
// L1: (key=1, sequence=7)  <- level 1
// L2: (key=1, sequence=8)  <- level 2, but higher sequence (newer!)

// pickHighLevel() returns L1 (because level 1 < 2)
// But should return L2 (because sequence 8 > 7)

It should use sequence.field comparator when configured, similar to how SortMergeReaderWithMinHeap correctly handles it at line 61-67.

Anything else?

This issue only affects changelog-producer = lookup scenario. Normal queries (Batch/Streaming Scan) and Lookup Join are not affected.

I'm working on a fix and will submit a PR shortly. The PR includes a complete unit test to reproduce this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions