Skip to content

Commit 064bfe4

Browse files
hackerginlsyldliu
authored andcommitted
[FLINK-37187][doc] Add doc for submitting Materialized Table refresh job to Yarn/K8s
---------- Co-authored-by: Ron <[email protected]> This closes #26073
1 parent 14b5d53 commit 064bfe4

File tree

6 files changed

+401
-30
lines changed

6 files changed

+401
-30
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
---
2+
title: 部署
3+
weight: 3
4+
type: docs
5+
aliases:
6+
- /dev/table/materialized-table/deployment.html
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# Introduction
28+
29+
物化表的创建及运维涉及多个组件的协同工作。本文将从架构解析、环境准备、部署流程到操作实践,系统地说明物化表的完整部署方案。
30+
31+
# 架构介绍
32+
33+
- **Client**: 可以是任何能够与 [Flink SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}}) 交互的客户端,如 [SQL 客户端]({{< ref "docs/dev/table/sqlClient" >}})、[Flink JDBC 驱动]({{< ref "docs/dev/table/jdbcDriver" >}}) 等。
34+
- **Flink SQL Gateway**: 支持创建、修改和删除物化表。并包含了一个内置的工作流调度器,用于定期刷新全量模式的物化表。
35+
- **Flink Cluster**: 用于运行物化表刷新作业的 Flink 集群。
36+
- **Catalog**: 负责管理物化表元数据的创建、查询、修改和删除。
37+
- **Catalog Store**: 提供 Catalog 属性持久化功能,以便在操作物化表时自动初始化 Catalog 并获取相关的元数据。
38+
39+
{{< img src="/fig/materialized-table-architecture.svg" alt="Illustration of Flink Materialized Table Architecture" width="85%" >}}
40+
41+
42+
# 部署准备
43+
44+
## Flink 集群环境准备
45+
46+
物化表刷新作业目前支持在以下集群环境中运行:
47+
* [Standalone clusters]({{<ref "docs/deployment/resource-providers/standalone/overview">}})
48+
* [YARN clusters]({{<ref "docs/deployment/resource-providers/yarn" >}})
49+
* [Kubernetes clusters]({{<ref "docs/deployment/resource-providers/native_kubernetes" >}})
50+
51+
## 部署 SQL Gateway
52+
53+
物化表必须通过 SQL Gateway 创建,SQL Gateway 需要针对元数据持久化和作业调度进行特定的配置。
54+
55+
### 配置 Catalog Store
56+
57+
`config.yaml` 中增加 `catalog store` 相关配置:
58+
```yaml
59+
table:
60+
catalog-store:
61+
kind: file
62+
file:
63+
path: {path_to_catalog_store} # 替换成实际的路径
64+
```
65+
更多详情配置可参考 [Catalog Store]({{<ref "docs/dev/table/catalogs">}}#catalog-store)。
66+
67+
### 配置工作流调度器插件
68+
69+
`config.yaml` 增加工作流调度器配置,用于定时调度刷新作业。 当前我们仅支持 `embedded` 调度器:
70+
71+
```yaml
72+
workflow-scheduler:
73+
type: embedded
74+
```
75+
76+
### 启动 SQL Gateway
77+
78+
使用以下命令启动 SQL Gateway:
79+
```
80+
./sql-gateway.sh start
81+
```
82+
83+
<span class="label label-danger">注意</span>
84+
Catalog 必须支持创建物化表,目前只有 [Paimon Catalog](https://paimon.apache.org/docs/master/concepts/table-types/#materialized-table) 支持。
85+
86+
# 操作指南
87+
88+
## 连接到 SQL Gateway
89+
90+
使用 SQL Client 的示例:
91+
92+
```shell
93+
./sql-client.sh gateway --endpoint {gateway_endpoint}:{gateway_port}
94+
```
95+
96+
## 创建物化表
97+
98+
### 在 Standalone 集群运行刷新作业
99+
100+
```sql
101+
Flink SQL> SET 'execution.mode' = 'remote';
102+
[INFO] Execute statement succeeded.
103+
104+
FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
105+
> ...
106+
[INFO] Execute statement succeeded.
107+
```
108+
109+
### 在 session 模式下运行刷新作业
110+
111+
在 session 模式下执行时,需要提前创建 session 集群,具体可以参考文档 [yarn-session]({{< ref "docs/deployment/resource-providers/yarn" >}}#starting-a-flink-session-on-yarn) 和 [kubernetes-session]({{<ref "docs/deployment/resource-providers/native_kubernetes" >}}#starting-a-flink-session-on-kubernetes)
112+
113+
**Kubernetes session 模式:**
114+
115+
```sql
116+
Flink SQL> SET 'execution.mode' = 'kubernetes-session';
117+
[INFO] Execute statement succeeded.
118+
119+
Flink SQL> SET 'kubernetes.cluster-id' = 'flink-cluster-mt-session-1';
120+
[INFO] Execute statement succeeded.
121+
122+
FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
123+
> ...
124+
[INFO] Execute statement succeeded.
125+
```
126+
127+
设置 `execution.mode``kubernetes-session` 并设置参数 `kubernetes.cluster-id` 指向一个已经存在的 Kubernetes session 集群.
128+
129+
**YARN session 模式:**
130+
131+
```sql
132+
Flink SQL> SET 'execution.mode' = 'yarn-session';
133+
[INFO] Execute statement succeeded.
134+
135+
Flink SQL> SET 'yarn.application.id' = 'application-xxxx';
136+
[INFO] Execute statement succeeded.
137+
138+
FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
139+
> ...
140+
[INFO] Execute statement succeeded.
141+
```
142+
设置 `execution.mode``yarn-session` 并设置参数 `yarn.application.id` 指向一个已经存在的 YARN session 集群。
143+
144+
### 在 application 模式下运行刷新作业
145+
146+
**Kubernetes application 模式:**
147+
148+
```sql
149+
Flink SQL> SET 'execution.mode' = 'kubernetes-application';
150+
[INFO] Execute statement succeeded.
151+
152+
Flink SQL> SET 'kubernetes.cluster-id' = 'flink-cluster-mt-application-1';
153+
[INFO] Execute statement succeeded.
154+
155+
FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
156+
> ...
157+
[INFO] Execute statement succeeded.
158+
```
159+
设置 `execution.mode``kubernetes-application``kubernetes.cluster-id` 是一个可选配置,如果未配置,在提交作业时会自动生成。
160+
161+
**YARN application 模式:**
162+
163+
```sql
164+
Flink SQL> SET 'execution.mode' = 'yarn-application';
165+
[INFO] Execute statement succeeded.
166+
167+
FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
168+
> ...
169+
[INFO] Execute statement succeeded.
170+
```
171+
设置 `execution.mode``yarn-application``yarn.application.id` 无需配置。
172+
173+
## 运维操作
174+
175+
集群信息(如 `execution.mode``kubernetes.cluster-id`)已持久化在 Catalog 中,暂停或恢复物化表刷新作业时无需重复设置。
176+
177+
### 暂停刷新作业
178+
```sql
179+
-- 暂停物化表刷新作业
180+
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table SUSPEND
181+
[INFO] Execute statement succeeded.
182+
```
183+
184+
### 恢复刷新作业
185+
```sql
186+
-- 恢复物化表刷新作业
187+
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table RESUME
188+
[INFO] Execute statement succeeded.
189+
```
190+
191+
### 修改查询定义
192+
```sql
193+
-- 修改物化表查询定义
194+
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table
195+
> AS
196+
> ...
197+
[INFO] Execute statement succeeded.
198+
```

docs/content.zh/docs/dev/table/materialized-table/overview.md

-4
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,6 @@ under the License.
2828

2929
物化表是 Flink SQL 引入的一种新的表类型,旨在简化批处理和流处理数据管道,提供一致的开发体验。在创建物化表时,通过指定数据新鲜度和查询,Flink 引擎会自动推导出物化表的 Schema ,并创建相应的数据刷新管道,以达到指定的新鲜度。
3030

31-
{{< hint warning >}}
32-
**注意**:该功能目前是一个 MVP(最小可行产品)功能,仅在 [SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}})中可用,并且只支持部署作业到 Flink [Standalone]({{< ref "docs/deployment/resource-providers/standalone/overview" >}})集群。
33-
{{< /hint >}}
34-
3531
# 核心概念
3632

3733
物化表包含以下核心概念:数据新鲜度、刷新模式、查询定义和 `Schema`

docs/content.zh/docs/dev/table/materialized-table/quickstart.md

+1-11
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: 快速入门
3-
weight: 3
3+
weight: 4
44
type: docs
55
aliases:
66
- /dev/table/materialized-table/quickstart.html
@@ -29,16 +29,6 @@ under the License.
2929

3030
本入门指南将帮助你快速了解并开始使用物化表。内容包括环境设置,以及创建、修改和删除持续模式和全量模式的物化表。
3131

32-
# 架构介绍
33-
34-
- **Client**: 可以是任何能够与 [Flink SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}}) 交互的客户端,如 [SQL 客户端]({{< ref "docs/dev/table/sqlClient" >}})、[Flink JDBC 驱动]({{< ref "docs/dev/table/jdbcDriver" >}}) 等。
35-
- **Flink SQL Gateway**: 支持创建、修改和删除物化表。并包含了一个内置的工作流调度器,用于定期刷新全量模式的物化表。
36-
- **Flink Cluster**: 用于运行物化表刷新作业的 Flink 集群。
37-
- **Catalog**: 负责管理物化表元数据的创建、查询、修改和删除。
38-
- **Catalog Store**: 提供 Catalog 属性持久化功能,以便在操作物化表时自动初始化 Catalog 并获取相关的元数据。
39-
40-
{{< img src="/fig/materialized-table-architecture.svg" alt="Illustration of Flink Materialized Table Architecture" width="85%" >}}
41-
4232
# 环境搭建
4333

4434
## 目录准备

0 commit comments

Comments
 (0)