Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35774][cdc-runtime] Fix the cache of transform is not updated after process schema change event #3455

Closed
wants to merge 3 commits into from

Conversation

aiwenmo
Copy link
Contributor

@aiwenmo aiwenmo commented Jul 6, 2024

This closes FLINK-35774.

@aiwenmo
Copy link
Contributor Author

aiwenmo commented Jul 6, 2024

@yuxiqian @lvyanquan PTAL

Copy link
Contributor

@yuxiqian yuxiqian Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure if transformProjectionProcessorMap and transformFilterProcessorMap will be flushed when upstream schema change event arrives at TransformDataOperator?

Got it, it will be refreshed by updating tableInfo.

@aiwenmo
Copy link
Contributor Author

aiwenmo commented Jul 31, 2024

I have rebased this branch onto master.

aiwenmo and others added 2 commits August 7, 2024 23:33
…after process schema change event when merging with route
Comment on lines +402 to +403
"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[1, 11, 1], after=[], op=DELETE, meta=()}",
"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[2, 22, ], after=[2, 22, x], op=UPDATE, meta=()}");
Copy link
Contributor

@yuxiqian yuxiqian Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is transform semantic changed here? Seems after the schema evolution, upstream schema will be [col1, newCol3], and downstream schema should be inferred as [col1, newCol3, col12] instead of [col1, col12, newCol3].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Transform semantic has changed. The previous discussion resulted in the current code architecture being unable to achieve the expected semantics.
Based on the following code, it may be possible to achieve.
#3285

"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[1, 1, 10], after=[], op=DELETE, meta=()}",
"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[2, , 20], after=[2, x, 20], op=UPDATE, meta=()}");
"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[1, 10, 1], after=[], op=DELETE, meta=()}",
"DataChangeEvent{tableId=default_namespace.default_schema.table1, before=[2, 20, ], after=[2, 20, x], op=UPDATE, meta=()}");
Copy link
Contributor

@yuxiqian yuxiqian Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add tests to verify if downstream schema (could be accessed by ValuesDatabase#getTableSchema) is as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a try tonight.

@aiwenmo
Copy link
Contributor Author

aiwenmo commented Aug 8, 2024

The issue has been fixed by #3285.

@aiwenmo aiwenmo closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants