Skip to content

Commit e73f3ad

Browse files
authored
[FLINK-34679][cdc][docs] Add core concept pages for Flink CDC docs
This closes #3153.
1 parent 8fd28f1 commit e73f3ad

File tree

6 files changed

+199
-0
lines changed

6 files changed

+199
-0
lines changed

docs/content/docs/core-concept/data-pipeline.md

+77
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,80 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
Since events in Flink CDC flow from the upstream to the downstream in a pipeline manner, the whole ETL task is referred as a **Data Pipeline**.
29+
30+
# Parameters
31+
A pipeline corresponds to a chain of operators in Flink.
32+
To describe a Data Pipeline, the following parts are required:
33+
- [source]({{< ref "docs/core-concept/data-source" >}})
34+
- [sink]({{< ref "docs/core-concept/data-sink" >}})
35+
- [pipeline](#pipeline-configurations)
36+
37+
the following parts are optional:
38+
- [route]({{< ref "docs/core-concept/route" >}})
39+
- [transform]({{< ref "docs/core-concept/transform" >}})
40+
41+
# Example
42+
## Only required
43+
We could use following yaml file to define a concise Data Pipeline describing synchronize all tables under MySQL app_db database to Doris :
44+
45+
```yaml
46+
source:
47+
type: mysql
48+
hostname: localhost
49+
port: 3306
50+
username: root
51+
password: 123456
52+
tables: app_db.\.*
53+
54+
sink:
55+
type: doris
56+
fenodes: 127.0.0.1:8030
57+
username: root
58+
password: ""
59+
60+
pipeline:
61+
name: Sync MySQL Database to Doris
62+
parallelism: 2
63+
```
64+
65+
## With optional
66+
We could use following yaml file to define a complicated Data Pipeline describing synchronize all tables under MySQL app_db database to Doris and give specific target database name ods_db and specific target table name prefix ods_ :
67+
68+
```yaml
69+
source:
70+
type: mysql
71+
hostname: localhost
72+
port: 3306
73+
username: root
74+
password: 123456
75+
tables: app_db.\.*
76+
77+
sink:
78+
type: doris
79+
fenodes: 127.0.0.1:8030
80+
username: root
81+
password: ""
82+
route:
83+
- source-table: app_db.orders
84+
sink-table: ods_db.ods_orders
85+
- source-table: app_db.shipments
86+
sink-table: ods_db.ods_shipments
87+
- source-table: app_db.products
88+
sink-table: ods_db.ods_products
89+
90+
pipeline:
91+
name: Sync MySQL Database to Doris
92+
parallelism: 2
93+
```
94+
95+
# Pipeline Configurations
96+
The following config options of Data Pipeline level are supported:
97+
98+
| parameter | meaning | optional/required |
99+
|-----------------|-----------------------------------------------------------------------------------------|-------------------|
100+
| name | The name of the pipeline, which will be submitted to the Flink cluster as the job name. | optional |
101+
| parallelism | The global parallelism of the pipeline. | required |
102+
| local-time-zone | The local time zone defines current session time zone id. | optional |

docs/content/docs/core-concept/data-sink.md

+25
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,28 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
**Data Sink** is used to apply schema changes and write change data to external systems.
29+
A Data Sink can write to multiple tables simultaneously.
30+
31+
# Parameters
32+
To describe a data sink, the follows are required:
33+
34+
| parameter | meaning | optional/required |
35+
|-----------------------------|-------------------------------------------------------------------------------------------------|-------------------|
36+
| type | The type of the sink, such as doris or starrocks. | required |
37+
| name | The name of the sink, which is user-defined (a default value provided). | optional |
38+
| configurations of Data Sink | Configurations to build the Data Sink e.g. connection configurations and sink table properties. | optional |
39+
40+
# Example
41+
We could use this yaml file to define a doris sink:
42+
```yaml
43+
sink:
44+
type: doris
45+
name: doris-sink # Optional parameter for description purpose
46+
fenodes: 127.0.0.1:8030
47+
username: root
48+
password: ""
49+
table.create.properties.replication_num: 1 # Optional parameter for advanced functionalities
50+
```

docs/content/docs/core-concept/data-source.md

+26
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,29 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
**Data Source** is used to access metadata and read the changed data from external systems.
29+
A Data Source can read data from multiple tables simultaneously.
30+
31+
# Parameters
32+
To describe a data source, the follows are required:
33+
34+
| parameter | meaning | optional/required |
35+
|-------------------------------|-----------------------------------------------------------------------------------------------------|-------------------|
36+
| type | The type of the source, such as mysql. | required |
37+
| name | The name of the source, which is user-defined (a default value provided). | optional |
38+
| configurations of Data Source | Configurations to build the Data Source e.g. connection configurations and source table properties. | optional |
39+
40+
# Example
41+
We could use yaml files to define a mysql source:
42+
```yaml
43+
source:
44+
type: mysql
45+
name: mysql-source #optional,description information
46+
host: localhost
47+
port: 3306
48+
username: admin
49+
password: pass
50+
tables: adb.*, bdb.user_table_[0-9]+, [app|web]_order_\.*
51+
```

docs/content/docs/core-concept/route.md

+49
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,52 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
**Route** specifies the rule of matching a list of source-table and mapping to sink-table. The most typical scenario is the merge of sub-databases and sub-tables, routing multiple upstream source tables to the same sink table.
29+
30+
# Parameters
31+
To describe a route, the follows are required:
32+
33+
| parameter | meaning | optional/required |
34+
|--------------|----------------------------------------------------|-------------------|
35+
| source-table | Source table id, supports regular expressions | required |
36+
| sink-table | Sink table id, supports regular expressions | required |
37+
| description | Routing rule description(a default value provided) | optional |
38+
39+
A route module can contain a list of source-table/sink-table rules.
40+
41+
# Example
42+
## Route one Data Source table to one Data Sink table
43+
if synchronize the table `web_order` in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route:
44+
45+
```yaml
46+
route:
47+
source-table: mydb.web_order
48+
sink-table: mydb.ods_web_order
49+
description: sync table to one destination table with given prefix ods_
50+
```
51+
52+
## Route multiple Data Source tables to one Data Sink table
53+
What's more, if you want to synchronize the sharding tables in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route:
54+
```yaml
55+
route:
56+
source-table: mydb\.*
57+
sink-table: mydb.ods_web_order
58+
description: sync sharding tables to one destination table
59+
```
60+
61+
## Complex Route via combining route rules
62+
What's more, if you want to specify many different mapping rules, we can use this yaml file to define this route:
63+
```yaml
64+
route:
65+
- source-table: mydb.orders
66+
sink-table: ods_db.ods_orders
67+
description: sync orders table to orders
68+
- source-table: mydb.shipments
69+
sink-table: ods_db.ods_shipments
70+
description: sync shipments table to ods_shipments
71+
- source-table: mydb.products
72+
sink-table: ods_db.ods_products
73+
description: sync products table to ods_products
74+
```

docs/content/docs/core-concept/table-id.md

+15
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,18 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
When connecting to external systems, it is necessary to establish a mapping relationship with the storage objects of the external system. This is what **Table Id** refers to.
29+
30+
# Example
31+
To be compatible with most external systems, the Table Id is represented by a 3-tuple : (namespace, schemaName, tableName).
32+
Connectors should establish the mapping between Table Id and storage objects in external systems.
33+
34+
The following table lists the parts in table Id of different data systems:
35+
36+
| data system | parts in tableId | String example |
37+
|-----------------------|--------------------------|---------------------|
38+
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
39+
| MySQL/Doris/StarRocks | database, table | mydb.orders |
40+
| Kafka | topic | orders |

docs/content/docs/core-concept/transform.md

+7
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,10 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
**Transform** module helps users delete and expand data columns based on the data columns in the table.
29+
What's more, it also helps users filter some unnecessary data during the synchronization process.
30+
31+
# Example
32+
This feature will support soon.

0 commit comments

Comments
 (0)