Skip to content

Commit ff371b3

Browse files
committed
[FLINK-34679][cdc] add doc under core-concept.
1 parent 84ca4dd commit ff371b3

File tree

6 files changed

+199
-0
lines changed

6 files changed

+199
-0
lines changed

docs/content/docs/core-concept/data-pipeline.md

+77
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,80 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
Since events in Flink CDC flow from the upstream to the downstream in a pipeline manner, the data synchronization task is also referred as a Data Pipeline.
29+
30+
# Parameters
31+
A pipeline corresponds to a chain of operators in Flink.
32+
To describe a Data Pipeline, the following parts are required:
33+
- [source](data-source.md)
34+
- [sink](data-sink.md)
35+
- [pipeline](data-pipeline.md#global-parameters) (global parameters)
36+
37+
the following parts are optional:
38+
- [route](route.md)
39+
- [transform](transform.md)
40+
41+
# Example
42+
## Only required
43+
We could use this concise yaml file to define a pipeline:
44+
45+
```yaml
46+
source:
47+
type: mysql
48+
hostname: localhost
49+
port: 3306
50+
username: root
51+
password: 123456
52+
tables: app_db.\.*
53+
54+
sink:
55+
type: doris
56+
fenodes: 127.0.0.1:8030
57+
username: root
58+
password: ""
59+
60+
pipeline:
61+
name: Sync MySQL Database to Doris
62+
parallelism: 2
63+
```
64+
65+
## With optional
66+
We could use this complicated yaml file to define a pipeline:
67+
68+
```yaml
69+
source:
70+
type: mysql
71+
hostname: localhost
72+
port: 3306
73+
username: root
74+
password: 123456
75+
tables: app_db.\.*
76+
77+
sink:
78+
type: doris
79+
fenodes: 127.0.0.1:8030
80+
username: root
81+
password: ""
82+
route:
83+
- source-table: app_db.orders
84+
sink-table: ods_db.ods_orders
85+
- source-table: app_db.shipments
86+
sink-table: ods_db.ods_shipments
87+
- source-table: app_db.products
88+
sink-table: ods_db.ods_products
89+
90+
pipeline:
91+
name: Sync MySQL Database to Doris
92+
parallelism: 2
93+
```
94+
95+
# Global Parameters
96+
The following parameters are global parameters of the pipeline:
97+
98+
| parameter | meaning |
99+
|-------------|--------------------------|
100+
| name | The name of the pipeline, which will be submitted to the Flink cluster as the job name. |
101+
| parallelism | The global parallelism of the pipeline. |
102+
| local-time-zone | The local time zone defines current session time zone id. |

docs/content/docs/core-concept/data-sink.md

+25
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,28 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
The Data Sink is used to apply schema changes and write change data to external systems.
29+
A Data Sink can write to multiple tables simultaneously.
30+
31+
# Parameters
32+
To describe a data sink, the follows are required:
33+
34+
| parameter | meaning |
35+
|-----------------------------|----------------------------------------------------------------------------------------|
36+
| type | The type of the sink, such as doris or starrocks. |
37+
| name | The name of the sink, which is user-defined (optional, with a default value provided). |
38+
| other custom configurations | custom configurations for the sink to specify the connection config and table config. |
39+
40+
# Example
41+
We could use this yaml file to define a doris sink:
42+
```yaml
43+
sink:
44+
type: doris
45+
name: doris-sink # Optional parameter for description purpose
46+
fenodes: 127.0.0.1:8030
47+
username: root
48+
password: ""
49+
table.create.properties.replication_num: 1 # Optional parameter for advanced functionalities
50+
```

docs/content/docs/core-concept/data-source.md

+26
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,29 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
Data Source is used to access metadata and read the changed data from external systems.
29+
A Data Source can read data from multiple tables simultaneously.
30+
31+
# Parameters
32+
To describe a data source, the follows are required:
33+
34+
| parameter | meaning |
35+
|-----------------------------|------------------------------------------------------------------------------|
36+
| type | The type of the source, such as mysql. |
37+
| name | The name of the source, which is user-defined (optional, with a default value provided). |
38+
| other custom configurations | custom configurations for the source to specify the connection config and table config. |
39+
40+
# Example
41+
We could use yaml files to define a mysql source:
42+
```yaml
43+
source:
44+
type: mysql
45+
name: mysql-source #optional,description information
46+
host: localhost
47+
port: 3306
48+
username: admin
49+
password: pass
50+
tables: adb.*, bdb.user_table_[0-9]+, [app|web]_order_\.*
51+
```

docs/content/docs/core-concept/route.md

+49
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,52 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
Route specifies the rule of matching a list of source-table and mapping to sink-table. The most typical scenario is the merge of sub-databases and sub-tables, routing multiple upstream source tables to the same sink table.
29+
30+
# Parameters
31+
To describe a route, the follows are required:
32+
33+
| parameter | meaning |
34+
|-----------------------|--------------------------|
35+
| source-table | Source table id, supports regular expressions |
36+
| sink-table | Sink table id, supports regular expressions |
37+
| description | Routing rule description(optional, default value provided) |
38+
39+
A route module can contain a list of source-table/sink-table rules.
40+
41+
# Example
42+
## one to one
43+
if synchronize the table `web_order` in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route:
44+
45+
```yaml
46+
route:
47+
source-table: mydb.web_order
48+
sink-table: mydb.ods_web_order
49+
description: sync table to one destination table with given prefix ods_
50+
```
51+
52+
## many to one
53+
What's more, if you want to synchronize the sharding tables in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route:
54+
```yaml
55+
route:
56+
source-table: mydb\.*
57+
sink-table: mydb.ods_web_order
58+
description: sync sharding tables to one destination table
59+
```
60+
61+
## many rules
62+
What's more, if you want to specify many different mapping rules, we can use this yaml file to define this route:
63+
```yaml
64+
route:
65+
- source-table: mydb.orders
66+
sink-table: ods_db.ods_orders
67+
description: sync orders table to orders
68+
- source-table: mydb.shipments
69+
sink-table: ods_db.ods_shipments
70+
description: sync shipments table to ods_shipments
71+
- source-table: mydb.products
72+
sink-table: ods_db.ods_products
73+
description: sync products table to ods_products
74+
```

docs/content/docs/core-concept/table-id.md

+15
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,18 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
When connecting to external systems, it is necessary to establish a mapping relationship with the storage objects of the external system. This is what Table Id refers to.
29+
30+
# Example
31+
To be compatible with most external systems, the Table Id is represented by a 3-tuple : (namespace, schemaName, tableName).
32+
Connectors should establish the mapping between Table Id and storage objects in external systems.
33+
34+
The following table lists the parts in table Id of different data systems:
35+
36+
| data system | parts in tableId | String example |
37+
|-----------------------|--------------------------|---------------------|
38+
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
39+
| MySQL/Doris/StarRocks | database, table | mydb.orders |
40+
| Kafka | topic | orders |

docs/content/docs/core-concept/transform.md

+7
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,10 @@ KIND, either express or implied. See the License for the
2323
specific language governing permissions and limitations
2424
under the License.
2525
-->
26+
27+
# Definition
28+
The transform module helps users delete and expand data columns based on the data columns in the table.
29+
What's more, it also helps users filter some unnecessary data during the synchronization process.
30+
31+
# Example
32+
This feature has not been implemented in version 3.0, but it will be completed in version 3.1, please wait for the next major version release.

0 commit comments

Comments
 (0)