Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS #8768

Merged
merged 8 commits into from
Feb 21, 2025

Conversation

xiaochen-zhou
Copy link
Contributor

Purpose of this pull request

When users explicitly set QUERY_TABLET_SIZE instead of using the default value Integer.MAX_VALUE, the returned List partitions contains partitions with the same beAddress.

List<StarRocksSourceSplit> getStarRocksSourceSplit() {
  List<StarRocksSourceSplit> sourceSplits = new ArrayList<>();
  List<QueryPartition> partitions = starRocksQueryPlanReadClient.findPartitions();
  for (int i = 0; i < partitions.size(); i++) {
    sourceSplits.add(
      new StarRocksSourceSplit(
        partitions.get(i), String.valueOf(partitions.get(i).hashCode())));
  }
  return sourceSplits;
}

To avoid establishing a connection to the BE for each split during the read process, the StarRocksBeReadClient is cached based on the beAddress from the split's partition. However, in StarRocksBeReadClient#openScanner(), some variables are not being reset, leading to the potential use of stale values from the cache. For example, if eos (end of stream) is true from a previous partition read, it can cause data loss for new partitions on the same BE.

Does this PR introduce any user-facing change?

no

How was this patch tested?

add new tests

Check list

@xiaochen-zhou xiaochen-zhou changed the title [Feature][Connector-V2] Fix possible data loss in certain scenarios of starrocks [Fix][Connector-V2] Fix possible data loss in certain scenarios of starrocks Feb 19, 2025
@xiaochen-zhou
Copy link
Contributor Author

Friendly ping, do you have time to take a look @Hisoka-X 🙏 ?

@hailin0
Copy link
Member

hailin0 commented Feb 19, 2025

Please add testcase (reproduce & verify)

@xiaochen-zhou
Copy link
Contributor Author

Please add testcase (reproduce & verify)

Ok. I'll try to complete it today.

@xiaochen-zhou
Copy link
Contributor Author

I added a test case, StarRocksIT#testStarRocksReadRowCount(), to verify whether the number of rows written to the sink matches the number of rows read from the source in scenarios where request_tablet_size is less than the number of BUCKETS.

When I set the table's buckets to 3:

DISTRIBUTED BY HASH(`BIGINT_COL`) BUCKETS 3

At the same time, when request_tablet_size is set to a value less than 3:

image

The StarRocksIT#testStarRocksReadRowCount() test could not pass before the fix:

image

In this case, the row count is 31, which is less than the expected 100.

image

After applying the fix, the StarRocksIT#testStarRocksReadRowCount() test now passes successfully:

image

@hailin0 @Hisoka-X

@xiaochen-zhou xiaochen-zhou changed the title [Fix][Connector-V2] Fix possible data loss in certain scenarios of starrocks [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS. Feb 20, 2025
@xiaochen-zhou xiaochen-zhou changed the title [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS. [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS Feb 20, 2025
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hailin0
Copy link
Member

hailin0 commented Feb 21, 2025

good pr

@hailin0 hailin0 merged commit 3c6f216 into apache:dev Feb 21, 2025
3 checks passed
@xiaochen-zhou xiaochen-zhou deleted the fix_sr_data_loss branch February 22, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants