Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix-issue-2676] repair a snapshot-split bug: #2968

Closed
wants to merge 4 commits into from

Conversation

AidenPerce
Copy link

when use splitOneUnevenlySizedChunk, sometimes(for example, a database or a table or a column use 'utf8mb4_general_ci') we will get a big chunk. For example when the value of primaryKey like ['0000','1111','2222','3333','4444','aaaa','bbbb','cccc','ZZZZ',...] almost all of the values which starts with a [a-zA-z] will be split into a big chunk;

@yuxiqian
Copy link
Contributor

Hi @AidenPerce, could you please rebase this PR with latest master branch before it could be merged? Renaming like com.ververica.cdc to org.apache.flink.cdc might be necessary.

@AidenPerce
Copy link
Author

Hi @AidenPerce, could you please rebase this PR with latest master branch before it could be merged? Renaming like com.ververica.cdc to org.apache.flink.cdc might be necessary.

I have rebase the branch of this pr, but there are some problems in test-case of db2-connector, that caused i can't
update this pr. I don't know why the case couldn't pass, this is the check log: https://github.com/AidenPerce/flink-cdc-connectors/actions/runs/8963238039/job/24613239338

@AidenPerce AidenPerce reopened this May 6, 2024
@yuxiqian
Copy link
Contributor

yuxiqian commented May 6, 2024

Hi @AidenPerce, sorry about the inconvenience. I think it's related to a glitch in Db2 incremental connector and should be fixed by #3283. Will try to get it merged asap.

@yuxiqian
Copy link
Contributor

yuxiqian commented May 6, 2024

Hi @AidenPerce, the Db2 CI fix has been merged into master branch, could you please rebase this PR and see if this problem persists? Thank you!

AidenPerce added 2 commits May 6, 2024 15:09
 resolved the conflict of org.apache.flink.cdc.connectors.mysql.source.utils.ObjectUtilsTest
 resolved the conflict of org.apache.flink.cdc.connectors.mysql.source.utils.ObjectUtilsTest
@AidenPerce
Copy link
Author

AidenPerce commented May 6, 2024

Hi @AidenPerce, the Db2 CI fix has been merged into master branch, could you please rebase this PR and see if this problem persists? Thank you!

I get a new problem, there is a tcase of mongodb with a failure:
NewlyAddedTableITCase.testRemoveAndAddCollectionsOneByOne:330->testRemoveAndAddCollectionsOneByOne:501 expected:<15> but was:<16>
like https://github.com/AidenPerce/flink-cdc-connectors/actions/runs/8965612344/job/24619438529

But i re-run the test-case successfully. Are there any issues with this test case?

@AidenPerce
Copy link
Author

@yuxiqian may help to merge it please ?

@AidenPerce AidenPerce changed the title [fix] repair a snapshot-split bug: [fix-issue-2676] repair a snapshot-split bug: May 8, 2024
Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @AidenPerce's great work, I left some comments. Seems MongoDB CI isn't stable recently, but it's clearly irrelevant to this PR, no worries.

JdbcConnection jdbcConnection,
TableId tableId,
String splitColumn)
throws SQLException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function throw an SQLException since there's a fallback logic (normal compare(obj1, obj2)?

* @throws ClassCastException if the compared objects are not instance of {@link Comparable} or
* not <i>mutually comparable</i> (for example, strings and integers).
*/
@SuppressWarnings("unchecked")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this suppress?

Comment on lines +106 to +107
* Object)}, we whill get: split1: ['0000','1111'] split2: ['3333','4444'] split3:
* ['aaaa','bbbb','cccc','ZZZZ'....] the split3 whill contains all of the remain rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Object)}, we whill get: split1: ['0000','1111'] split2: ['3333','4444'] split3:
* ['aaaa','bbbb','cccc','ZZZZ'....] the split3 whill contains all of the remain rows
* Object)}, we will get: split1: ['0000','1111'] split2: ['3333','4444'] split3:
* ['aaaa','bbbb','cccc','ZZZZ'....] the split3 will contains all of the remain rows

Comment on lines +129 to +130
String columnCollation =
StatementUtils.queryColumnCollation(jdbcConnection, tableId, splitColumn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC dbCompare might be called many times during the chunk splitting process, and my concern is performing a JDBC query execution every time might greatly deteriorate the performance. Is it possible to implement a native comparison compatible with MySQL?

Copy link
Author

@AidenPerce AidenPerce May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. during the chunk splitting process, db queries have been changed from once to three times. But at this stage, I cannot fully cover different MySQL versions as well as various database character sets and collations. So I temporarily chose a relatively simple approach to solve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this solution looks reasonable.

@yuxiqian
Copy link
Contributor

Hi @AidenPerce, is there any updates on this PR? Feel free to comment here if you need any help.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.

@github-actions github-actions bot added Stale and removed Stale labels Sep 11, 2024
Copy link

This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Nov 11, 2024
Copy link

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

@github-actions github-actions bot closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants