Skip to content

Commit 92081df

Browse files
authored
[FLINK-35167][pipeline-connector/maxcompute] Introduce MaxCompute pipeline DataSink
This closes #3254.
1 parent 98f9f15 commit 92081df

File tree

42 files changed

+5511
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+5511
-0
lines changed
Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
---
2+
title: "MaxCompute"
3+
weight: 7
4+
type: docs
5+
aliases:
6+
- /connectors/maxcompute
7+
---
8+
9+
<!--
10+
Licensed to the Apache Software Foundation (ASF) under one
11+
or more contributor license agreements. See the NOTICE file
12+
distributed with this work for additional information
13+
regarding copyright ownership. The ASF licenses this file
14+
to you under the Apache License, Version 2.0 (the
15+
"License"); you may not use this file except in compliance
16+
with the License. You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing,
21+
software distributed under the License is distributed on an
22+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
23+
KIND, either express or implied. See the License for the
24+
specific language governing permissions and limitations
25+
under the License.
26+
-->
27+
28+
# MaxCompute Connector
29+
30+
MaxCompute Pipeline 连接器可以用作 Pipeline 的 *Data Sink*,将数据写入[MaxCompute](https://www.aliyun.com/product/odps)
31+
本文档介绍如何设置 MaxCompute Pipeline 连接器。
32+
33+
## 连接器的功能
34+
35+
* 自动建表
36+
* 表结构变更同步
37+
* 数据实时同步
38+
39+
## 示例
40+
41+
从 MySQL 读取数据同步到 MaxCompute 的 Pipeline 可以定义如下:
42+
43+
```yaml
44+
source:
45+
type: mysql
46+
name: MySQL Source
47+
hostname: 127.0.0.1
48+
port: 3306
49+
username: admin
50+
password: pass
51+
tables: adb.\.*, bdb.user_table_[0-9]+, [app|web].order_\.*
52+
server-id: 5401-5404
53+
54+
sink:
55+
type: maxcompute
56+
name: MaxCompute Sink
57+
accessId: ak
58+
accessKey: sk
59+
endpoint: endpoint
60+
project: flink_cdc
61+
bucketSize: 8
62+
63+
pipeline:
64+
name: MySQL to MaxCompute Pipeline
65+
parallelism: 2
66+
```
67+
68+
## 连接器配置项
69+
70+
<div class="highlight">
71+
<table class="colwidths-auto docutils">
72+
<thead>
73+
<tr>
74+
<th class="text-left" style="width: 25%">Option</th>
75+
<th class="text-left" style="width: 8%">Required</th>
76+
<th class="text-left" style="width: 7%">Default</th>
77+
<th class="text-left" style="width: 10%">Type</th>
78+
<th class="text-left" style="width: 50%">Description</th>
79+
</tr>
80+
</thead>
81+
<tbody>
82+
<tr>
83+
<td>type</td>
84+
<td>required</td>
85+
<td style="word-wrap: break-word;">(none)</td>
86+
<td>String</td>
87+
<td>指定要使用的连接器, 这里需要设置成 <code>'maxcompute'</code>.</td>
88+
</tr>
89+
<tr>
90+
<td>name</td>
91+
<td>optional</td>
92+
<td style="word-wrap: break-word;">(none)</td>
93+
<td>String</td>
94+
<td>Sink 的名称.</td>
95+
</tr>
96+
<tr>
97+
<td>accessId</td>
98+
<td>required</td>
99+
<td style="word-wrap: break-word;">(none)</td>
100+
<td>String</td>
101+
<td>阿里云账号或RAM用户的AccessKey ID。您可以进入<a href="https://ram.console.aliyun.com/manage/ak">
102+
AccessKey管理页面</a> 获取AccessKey ID。</td>
103+
</tr>
104+
<tr>
105+
<td>accessKey</td>
106+
<td>required</td>
107+
<td style="word-wrap: break-word;">(none)</td>
108+
<td>String</td>
109+
<td>AccessKey ID对应的AccessKey Secret。您可以进入<a href="https://ram.console.aliyun.com/manage/ak">
110+
AccessKey管理页面</a> 获取AccessKey Secret。</td>
111+
</tr>
112+
<tr>
113+
<td>endpoint</td>
114+
<td>required</td>
115+
<td style="word-wrap: break-word;">(none)</td>
116+
<td>String</td>
117+
<td>MaxCompute服务的连接地址。您需要根据创建MaxCompute项目时选择的地域以及网络连接方式配置Endpoint。各地域及网络对应的Endpoint值,请参见<a href="https://help.aliyun.com/zh/maxcompute/user-guide/endpoints">
118+
Endpoint</a>。</td>
119+
</tr>
120+
<tr>
121+
<td>project</td>
122+
<td>required</td>
123+
<td style="word-wrap: break-word;">(none)</td>
124+
<td>String</td>
125+
<td>MaxCompute项目名称。您可以登录<a href="https://maxcompute.console.aliyun.com/">
126+
MaxCompute控制台</a>,在 工作区 > 项目管理 页面获取MaxCompute项目名称。</td>
127+
</tr>
128+
<tr>
129+
<td>tunnelEndpoint</td>
130+
<td>optional</td>
131+
<td style="word-wrap: break-word;">(none)</td>
132+
<td>String</td>
133+
<td>MaxCompute Tunnel服务的连接地址,通常这项配置可以根据指定的project所在的region进行自动路由。仅在使用代理等特殊网络环境下使用该配置。</td>
134+
</tr>
135+
<tr>
136+
<td>quotaName</td>
137+
<td>optional</td>
138+
<td style="word-wrap: break-word;">(none)</td>
139+
<td>String</td>
140+
<td>MaxCompute 数据传输使用的独享资源组名称,如不指定该配置,则使用共享资源组。详情可以参考<a href="https://help.aliyun.com/zh/maxcompute/user-guide/purchase-and-use-exclusive-resource-groups-for-dts">
141+
使用 Maxcompute 独享资源组</a></td>
142+
</tr>
143+
<tr>
144+
<td>stsToken</td>
145+
<td>optional</td>
146+
<td style="word-wrap: break-word;">(none)</td>
147+
<td>String</td>
148+
<td>当使用RAM角色颁发的短时有效的访问令牌(STS Token)进行鉴权时,需要指定该参数。</td>
149+
</tr>
150+
<tr>
151+
<td>bucketsNum</td>
152+
<td>optional</td>
153+
<td style="word-wrap: break-word;">16</td>
154+
<td>Integer</td>
155+
<td>自动创建 MaxCompute Delta 表时使用的桶数。使用方式可以参考 <a href="https://help.aliyun.com/zh/maxcompute/user-guide/transaction-table2-0-overview">
156+
Delta Table 概述</a></td>
157+
</tr>
158+
<tr>
159+
<td>compressAlgorithm</td>
160+
<td>optional</td>
161+
<td style="word-wrap: break-word;">zlib</td>
162+
<td>String</td>
163+
<td>写入MaxCompute时使用的数据压缩算法,当前支持<code>raw</code>(不进行压缩),<code>zlib</code>和<code>snappy</code>。</td>
164+
</tr>
165+
<tr>
166+
<td>totalBatchSize</td>
167+
<td>optional</td>
168+
<td style="word-wrap: break-word;">64MB</td>
169+
<td>String</td>
170+
<td>内存中缓冲的数据量大小,单位为分区级(非分区表单位为表级),不同分区(表)的缓冲区相互独立,达到阈值后数据写入到MaxCompute。</td>
171+
</tr>
172+
<tr>
173+
<td>bucketBatchSize</td>
174+
<td>optional</td>
175+
<td style="word-wrap: break-word;">4MB</td>
176+
<td>String</td>
177+
<td>内存中缓冲的数据量大小,单位为桶级,仅写入 Delta 表时生效。不同数据桶的缓冲区相互独立,达到阈值后将该桶数据写入到MaxCompute。</td>
178+
</tr>
179+
<tr>
180+
<td>numCommitThreads</td>
181+
<td>optional</td>
182+
<td style="word-wrap: break-word;">16</td>
183+
<td>Integer</td>
184+
<td>checkpoint阶段,能够同时处理的分区(表)数量。</td>
185+
</tr>
186+
<tr>
187+
<td>numFlushConcurrent</td>
188+
<td>optional</td>
189+
<td style="word-wrap: break-word;">4</td>
190+
<td>Integer</td>
191+
<td>写入数据到MaxCompute时,能够同时写入的桶数量。仅写入 Delta 表时生效。</td>
192+
</tr>
193+
</tbody>
194+
</table>
195+
</div>
196+
197+
## 使用说明
198+
199+
* 链接器 支持自动建表,将MaxCompute表与源表的位置关系、数据类型进行自动映射(参见下文映射表),当源表有主键时,自动创建
200+
MaxCompute Delta 表,否则创建普通 MaxCompute 表(Append表)
201+
* 当写入普通 MaxCompute 表(Append表)时,会忽略`delete`操作,`update`操作会被视为`insert`操作
202+
* 目前仅支持 at-least-once,Delta 表由于主键特性能够实现幂等写
203+
* 对于表结构变更同步
204+
* 新增列只能添加到最后一列
205+
* 修改列类型,只能修改为兼容的类型。兼容表可以参考[ALTER TABLE](https://help.aliyun.com/zh/maxcompute/user-guide/alter-table)
206+
207+
## 表位置映射
208+
209+
链接器自动建表时,使用如下映射关系,将源表的位置信息映射到MaxCompute表的位置。注意,当MaxCompute项目不支持Schema模型时,每个同步任务仅能同步一个Mysql
210+
Database。(其他Datasource同理,链接器会忽略TableId.namespace信息)
211+
212+
<div class="wy-table-responsive">
213+
<table class="colwidths-auto docutils">
214+
<thead>
215+
<tr>
216+
<th class="text-left">Flink CDC 中抽象</th>
217+
<th class="text-left">MaxCompute 位置</th>
218+
<th class="text-left" style="width:60%;">Mysql 位置</th>
219+
</tr>
220+
</thead>
221+
<tbody>
222+
<tr>
223+
<td>配置文件中project</td>
224+
<td>project</td>
225+
<td>(none)</td>
226+
</tr>
227+
<tr>
228+
<td>TableId.namespace</td>
229+
<td>schema(仅当MaxCompute项目支持Schema模型时,如不支持,将忽略该配置)</td>
230+
<td>database</td>
231+
</tr>
232+
<tr>
233+
<td>TableId.tableName</td>
234+
<td>table</td>
235+
<td>table</td>
236+
</tr>
237+
</tbody>
238+
</table>
239+
</div>
240+
241+
## 数据类型映射
242+
243+
<div class="wy-table-responsive">
244+
<table class="colwidths-auto docutils">
245+
<thead>
246+
<tr>
247+
<th class="text-left">Flink Type</th>
248+
<th class="text-left">MaxCompute Type</th>
249+
</tr>
250+
</thead>
251+
<tbody>
252+
<tr>
253+
<td>CHAR/VARCHAR</td>
254+
<td>STRING</td>
255+
</tr>
256+
<tr>
257+
<td>BOOLEAN</td>
258+
<td>BOOLEAN</td>
259+
</tr>
260+
<tr>
261+
<td>BINARY/VARBINARY</td>
262+
<td>BINARY</td>
263+
</tr>
264+
<tr>
265+
<td>DECIMAL</td>
266+
<td>DECIMAL</td>
267+
</tr>
268+
<tr>
269+
<td>TINYINT</td>
270+
<td>TINYINT</td>
271+
</tr>
272+
<tr>
273+
<td>SMALLINT</td>
274+
<td>SMALLINT</td>
275+
</tr>
276+
<tr>
277+
<td>INTEGER</td>
278+
<td>INTEGER</td>
279+
</tr>
280+
<tr>
281+
<td>BIGINT</td>
282+
<td>BIGINT</td>
283+
</tr>
284+
<tr>
285+
<td>FLOAT</td>
286+
<td>FLOAT</td>
287+
</tr>
288+
<tr>
289+
<td>DOUBLE</td>
290+
<td>DOUBLE</td>
291+
</tr>
292+
<tr>
293+
<td>TIME_WITHOUT_TIME_ZONE</td>
294+
<td>STRING</td>
295+
</tr>
296+
<tr>
297+
<td>DATE</td>
298+
<td>DATE</td>
299+
</tr>
300+
<tr>
301+
<td>TIMESTAMP_WITHOUT_TIME_ZONE</td>
302+
<td>TIMESTAMP_NTZ</td>
303+
</tr>
304+
<tr>
305+
<td>TIMESTAMP_WITH_LOCAL_TIME_ZONE</td>
306+
<td>TIMESTAMP</td>
307+
</tr>
308+
<tr>
309+
<td>TIMESTAMP_WITH_TIME_ZONE</td>
310+
<td>TIMESTAMP</td>
311+
</tr>
312+
<tr>
313+
<td>ARRAY</td>
314+
<td>ARRAY</td>
315+
</tr>
316+
<tr>
317+
<td>MAP</td>
318+
<td>MAP</td>
319+
</tr>
320+
<tr>
321+
<td>ROW</td>
322+
<td>STRUCT</td>
323+
</tr>
324+
</tbody>
325+
</table>
326+
</div>
327+
328+
{{< top >}}

0 commit comments

Comments
 (0)