Skip to content

Commit

Permalink
[FLINK-34729][docs] Translate "Core Concept" Pages of Flink CDC into …
Browse files Browse the repository at this point in the history
…Chinese

This closes #3901
  • Loading branch information
lvyanquan authored Feb 6, 2025
1 parent aeebf67 commit d1d334d
Show file tree
Hide file tree
Showing 6 changed files with 247 additions and 252 deletions.
36 changes: 18 additions & 18 deletions docs/content.zh/docs/core-concept/data-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,23 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
Since events in Flink CDC flow from the upstream to the downstream in a pipeline manner, the whole ETL task is referred as a **Data Pipeline**.
# 定义
由于在 Flink CDC 中,事件从上游流转到下游遵循 Pipeline 的模式,因此整个 ETL 作业也被称为 **Data Pipeline**

# Parameters
A pipeline corresponds to a chain of operators in Flink.
To describe a Data Pipeline, the following parts are required:
# 参数
一个 pipeline 包含着 Flink 的一组算子链。
为了描述 Data Pipeline,我们需要定义以下部分:
- [source]({{< ref "docs/core-concept/data-source" >}})
- [sink]({{< ref "docs/core-concept/data-sink" >}})
- [pipeline](#pipeline-configurations)

the following parts are optional:
下面 是 Data Pipeline 的一些可选配置:
- [route]({{< ref "docs/core-concept/route" >}})
- [transform]({{< ref "docs/core-concept/transform" >}})

# Example
## Only required
We could use following yaml file to define a concise Data Pipeline describing synchronize all tables under MySQL app_db database to Doris :
# 示例
## 只包含必须部分
我们可以使用以下 yaml 文件来定义一个简单的 Data Pipeline 来同步 MySQL app_db 数据库下的所有表到 Doris

```yaml
source:
Expand All @@ -62,8 +62,8 @@ We could use following yaml file to define a concise Data Pipeline describing sy
parallelism: 2
```
## With optional
We could use following yaml file to define a complicated Data Pipeline describing synchronize all tables under MySQL app_db database to Doris and give specific target database name ods_db and specific target table name prefix ods_ :
## 包含可选部分
我们可以使用以下 yaml 文件来定义一个复杂的 Data Pipeline 来同步 MySQL app_db 数据库下的所有表到 Doris,并给目标数据库名 ods_db 和目标表名前缀 ods_
```yaml
source:
Expand Down Expand Up @@ -108,11 +108,11 @@ We could use following yaml file to define a complicated Data Pipeline describin
classpath: com.example.functions.FormatFunctionClass
```
# Pipeline Configurations
The following config options of Data Pipeline level are supported:
# Pipeline 配置
下面 是 Data Pipeline 的一些可选配置:
| parameter | meaning | optional/required |
|-----------------|-----------------------------------------------------------------------------------------|-------------------|
| name | The name of the pipeline, which will be submitted to the Flink cluster as the job name. | optional |
| parallelism | The global parallelism of the pipeline. Defaults to 1. | optional |
| local-time-zone | The local time zone defines current session time zone id. | optional |
| 参数 | 含义 | optional/required |
|-----------------|---------------------------------------|-------------------|
| name | 这个 pipeline 的名称,会用在 Flink 集群中作为作业的名称。 | optional |
| parallelism | pipeline的全局并发度,默认值是1。 | optional |
| local-time-zone | 作业级别的本地时区。 | optional |
24 changes: 12 additions & 12 deletions docs/content.zh/docs/core-concept/data-sink.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Data Sink** is used to apply schema changes and write change data to external systems.
A Data Sink can write to multiple tables simultaneously.
# 定义
**Data Sink** 是 用来应用 schema 变更并写入 change data 到外部系统的组件。
一个 Data Sink 可以同时写入多个表。

# Parameters
To describe a data sink, the follows are required:
# 参数
为了定义一个 Data Sink,需要提供以下参数:

| parameter | meaning | optional/required |
|-----------------------------|-------------------------------------------------------------------------------------------------|-------------------|
| type | The type of the sink, such as doris or starrocks. | required |
| name | The name of the sink, which is user-defined (a default value provided). | optional |
| configurations of Data Sink | Configurations to build the Data Sink e.g. connection configurations and sink table properties. | optional |
| 参数 | 含义 | optional/required |
|-----------------------------|---------------------------------|-------------------|
| type | sink 的类型,例如 doris 或者 starrocks | required |
| name | sink 的名称,允许用户配置 (提供了一个默认值)。 | optional |
| configurations of Data Sink | 用于构建 sink 组件的配置,例如连接参数或者表属性的配置。 | optional |

# Example
We could use this yaml file to define a doris sink:
# 示例
我们可以使用以下的 yaml 文件来定义一个 doris sink
```yaml
sink:
type: doris
Expand Down
24 changes: 12 additions & 12 deletions docs/content.zh/docs/core-concept/data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Data Source** is used to access metadata and read the changed data from external systems.
A Data Source can read data from multiple tables simultaneously.
# 定义
**Data Source** 是 用来访问元数据以及从外部系统读取变更数据的组件。
一个 Data Source 可以同时访问多个表。

# Parameters
To describe a data source, the follows are required:
# 参数
为了定义一个 Data Source,需要提供以下参数:

| parameter | meaning | optional/required |
|-------------------------------|-----------------------------------------------------------------------------------------------------|-------------------|
| type | The type of the source, such as mysql. | required |
| name | The name of the source, which is user-defined (a default value provided). | optional |
| configurations of Data Source | Configurations to build the Data Source e.g. connection configurations and source table properties. | optional |
| 参数 | 含义 | optional/required |
|-------------------------------|-----------------------------------|-------------------|
| type | source 的类型,例如 mysql | required |
| name | source 的名称,允许用户配置 (提供了一个默认值)。 | optional |
| configurations of Data Source | 用于构建 source 组件的配置,例如连接参数或者表属性的配置。 | optional |

# Example
We could use yaml files to define a mysql source:
# 示例
我们可以使用yaml文件来定义一个mysql source
```yaml
source:
type: mysql
Expand Down
43 changes: 21 additions & 22 deletions docs/content.zh/docs/core-concept/route.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,24 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Route** specifies the rule of matching a list of source-table and mapping to sink-table. The most typical scenario is the merge of sub-databases and sub-tables, routing multiple upstream source tables to the same sink table.
# 定义
**Route** 代表一个路由规则,用来匹配一个或多个source 表,并映射到 sink 表。最常见的场景是合并子数据库和子表,将多个上游源表路由到同一个目标表。

# Parameters
To describe a route, the follows are required:
# 参数
为了定义一个路由规则,需要提供以下参数:

| parameter | meaning | optional/required |
|----------------|---------------------------------------------------------------------------------------------|-------------------|
| source-table | Source table id, supports regular expressions | required |
| sink-table | Sink table id, supports symbol replacement | required |
| replace-symbol | Special symbol in sink-table for pattern replacing, will be replaced by original table name | optional |
| description | Routing rule description(a default value provided) | optional |
| 参数 | 含义 | optional/required |
|----------------|------------------------------------------|-------------------|
| source-table | Source table id, 支持正则表达式 | required |
| sink-table | Sink table id,支持符号替换 | required |
| replace-symbol | 用于在 sink-table 中进行模式替换的特殊字符串, 会被源表中的表名替换 | optional |
| description | Route 规则的描述(提供了一个默认描述) | optional |

A route module can contain a list of source-table/sink-table rules.
一个 Route 模块可以包含一个或多个 source-table/sink-table 规则。

# Example
## Route one Data Source table to one Data Sink table
if synchronize the table `web_order` in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route
# 示例
## 路由一个 Data Source 表到一个 Data Sink
如果同步一个 `mydb` 数据库中的 `web_order` 表到一个相同库的 `ods_web_order` 表,我们可以使用下面的 yaml 文件来定义这个路由

```yaml
route:
Expand All @@ -50,17 +50,16 @@ route:
description: sync table to one destination table with given prefix ods_
```
## Route multiple Data Source tables to one Data Sink table
What's more, if you want to synchronize the sharding tables in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route
## 路由多个 Data Source 表到一个 Data Sink
更进一步的,如果同步一个 `mydb` 数据库中的多个分表到一个相同库的 `ods_web_order` 表,我们可以使用下面的 yaml 文件来定义这个路由
```yaml
route:
- source-table: mydb\.*
sink-table: mydb.ods_web_order
description: sync sharding tables to one destination table
```

## Complex Route via combining route rules
What's more, if you want to specify many different mapping rules, we can use this yaml file to define this route:
## 使用多个路由规则
更进一步的,如果需要定义多个路由规则,我们可以使用下面的 yaml 文件来定义这个路由:
```yaml
route:
- source-table: mydb.orders
Expand All @@ -74,9 +73,9 @@ route:
description: sync products table to ods_products
```

## Pattern Replacement in routing rules
## 包含符号替换的路由规则

If you'd like to route source tables and rename them to sink tables with specific patterns, `replace-symbol` could be used to resemble source table names like this:
如果你想将源表路由到 sink 表,并使用特定的模式替换源表名,那么 `replace-symbol` 就可以做到这一点:

```yaml
route:
Expand All @@ -86,4 +85,4 @@ route:
description: route all tables in source_db to sink_db
```

Then, all tables including `source_db.XXX` will be routed to `sink_db.XXX` without hassle.
然后,`source_db` 库下所有的表都会被同步到 `sink_db` 库下。
22 changes: 11 additions & 11 deletions docs/content.zh/docs/core-concept/table-id.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
When connecting to external systems, it is necessary to establish a mapping relationship with the storage objects of the external system. This is what **Table Id** refers to.
# 定义
在连接外部系统时,有必要建立一个与外部系统存储对象(例如表)的映射关系。这就是 **Table Id** 所代表的含义。

# Example
To be compatible with most external systems, the Table Id is represented by a 3-tuple : (namespace, schemaName, tableName).
Connectors should establish the mapping between Table Id and storage objects in external systems.
# 示例
为了兼容大部分外部系统,Table Id 被表示为 3 元组:(namespace, schemaName, tableName)
连接器应该在连接外部系统时建立与外部系统存储对象的映射关系。

The following table lists the parts in table Id of different data systems:
下面是不同数据系统对应的 tableId 的格式:

| data system | parts in tableId | String example |
|-----------------------|--------------------------|---------------------|
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
| MySQL/Doris/StarRocks | database, table | mydb.orders |
| Kafka | topic | orders |
| 数据系统 | tableId 的组成 | 字符串示例 |
|-----------------------|-------------------------|---------------------|
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
| MySQL/Doris/StarRocks | database, table | mydb.orders |
| Kafka | topic | orders |
Loading

0 comments on commit d1d334d

Please sign in to comment.