-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-36794] [cdc-composer/cli] pipeline cdc connector support multiple data sources #3844
base: master
Are you sure you want to change the base?
Conversation
stream = stream.union(streamBranch); | ||
} | ||
} | ||
boolean isParallelMetadataSource = dataSource.isParallelMetadataSource(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think multi data sources should be regarded as parallelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already modified
|
||
```yaml | ||
source: | ||
type: mysql_mutiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a new key like 'sources' to describe multiple sources? The '_multiple' suffix in value seems a bit odd. Because the YAML content does not correspond one-to-one with the PipelineDef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sources:
- type: mysql
name: mysql-instance-00
hostname: localhost
port: 3306
....
- type: mysql
name: mysql-instance-01
hostname: localhost
port: 3307
....
And the corresponding PipelineDef looks like this:
public class PipelineDef {
@Nullable private List<SourceDef> sources;
private final SourceDef source;
...
}
If the sources
is not null then we use these data sources, otherwise we use source
to build up DataStream. In this way, the previous usage will not be affected.
I want to hear your opinion. @yuxiqian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like @ChaomingZhangCN's proposed syntax for a fully multiple data source, they're intuitive and expressive, but might be a chore if users just want to connect to a MySQL cluster with multiple servers, as they have to copy all identical configurations to both source definition.
@linjianchang's solution for now seems like MySQL specific, especially for multi-host clusters. It could not be extended for hetero-sources (like concatenating data from different DBMS), or when one wants to use different configs for each node. These cases don't exist for now since all we have is MySQL source connector, but as we're modifying composer and YAML API (instead of MySQL connector itself), such possibility should be discussed more carefully.
As for multiple sources in pipeline itself, I remembered the idea has been informally discussed with @leonardBang and @PatrickRen long time ago, and the conclusion was running multiple sources in one single job actually makes the pipeline more fragile, since any single-point failure would easily escalate and cause a global failover. Things might have changed since then, still needs hearing from senior developers on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sources: - type: mysql name: mysql-instance-00 hostname: localhost port: 3306 .... - type: mysql name: mysql-instance-01 hostname: localhost port: 3307 ....And the corresponding PipelineDef looks like this:
public class PipelineDef { @Nullable private List<SourceDef> sources; private final SourceDef source; ... }If the
sources
is not null then we use these data sources, otherwise we usesource
to build up DataStream. In this way, the previous usage will not be affected. I want to hear your opinion. @yuxiqian @ChaomingZhangCN
It has been modified according to comments, please review it again, thanks!
private static final String HOST_NAME = "hostname"; | ||
private static final String PORT = "port"; | ||
private static final String COLON = ":"; | ||
private static final String MUTIPLE = "_mutiple"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be _multiple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already modified
5a9bd3e
to
f8524d7
Compare
f8524d7
to
994d17a
Compare
pipeline cdc connector support multiple data sources