Skip to content

Commit 50952f3

Browse files
committed
Exploring data using airbyte, clickhouse, and superset
Signed-off-by: xingcan-ltc <[email protected]>
1 parent 9a3b9e1 commit 50952f3

File tree

9 files changed

+187
-0
lines changed

9 files changed

+187
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Exploring data using airbyte, clickhouse, and superset
2+
English | [简体中文](../../zh/user-tutorials/exploring-data-using-airbyte-clickhouse-superset.md)
3+
4+
# 1. Introduction
5+
This guide will show you how to use the KDP platform to complete data integration/processing/BI display, involving applications such as `airbyte`, `clickhouse`, and `superset`. It is recommended to familiarize yourself with each component's quick start before following the steps in this guide.
6+
7+
# 2. Data Integration
8+
Import data from a csv file into clickhouse
9+
1. Add a file type source in airbyte.
10+
![](../../images/airbyte01.png)
11+
- Dataset Name: `tmall-order-sample` (please modify)
12+
- URL: `https://gitee.com/xing-can/pic/raw/master/tmall-order-sample.csv`
13+
14+
1. Add a clickhouse type destination in airbyte.
15+
![](../../images/airbyte03.png)
16+
- Host: `clickhouse.kdp-data.svc.cluster.local`
17+
- Port: `8123`
18+
- DB Name: `default`
19+
- User: `default`
20+
- Password: `ckdba.123`
21+
22+
1. Add a connection in airbyte, select file as the source and clickhouse as the destination, use the default configuration, and then save.
23+
![](../../images/airbyte02.png)
24+
25+
1. Check the airbyte job status. If successful, the data has been successfully imported into clickhouse.
26+
![](../../images/airbyte04.png)
27+
28+
After completing the above operations, the ELT (Extract Load Transform) process is completed, and the Transform is performed using clickhouse.
29+
30+
# 3. 数据加工
31+
32+
```bash
33+
# Enter clickhouse container
34+
kubectl exec -it clickhouse-shard0-0 -n kdp-data -c clickhouse -- bash
35+
# Connect to clickhouse
36+
clickhouse-client --user default --password ckdba.123
37+
# View the databases
38+
show databases;
39+
use airbyte_internal;
40+
show tables;
41+
# Confirm that the data has been successfully written
42+
select count(*) from airbyte_internal.default_raw__stream_tmall_order_sample;
43+
```
44+
45+
继续依次执行下面的三条SQL语句,完成数据加工。
46+
47+
```sql
48+
DROP TABLE IF EXISTS airbyte_internal.ods_tmall_order;
49+
50+
-- Define the structure of the new table ods_tmall_order
51+
CREATE TABLE airbyte_internal.ods_tmall_order
52+
(
53+
total_amount Int32,
54+
order_number Int32,
55+
shipping_address String,
56+
payment_time DateTime64(3, 'GMT'),
57+
order_creation_time DateTime64(3, 'GMT'),
58+
refund_amount Int32,
59+
actual_payment_amount Int32
60+
)
61+
ENGINE = MergeTree
62+
ORDER BY order_number;
63+
-- Assuming order_number is a unique identifier for each order
64+
65+
-- Insert data into the new table from the JSON in _airbyte_data
66+
INSERT INTO airbyte_internal.ods_tmall_order
67+
SELECT JSONExtractInt(_airbyte_data, '总金额') AS total_amount,
68+
JSONExtractInt(_airbyte_data, '订单编号') AS order_number,
69+
JSONExtractString(_airbyte_data, '收货地址 ') AS shipping_address,
70+
parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单付款时间 '), '')) AS payment_time,
71+
parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单创建时间'), '')) AS order_creation_time,
72+
73+
JSONExtractInt(_airbyte_data, '退款金额') AS refund_amount,
74+
JSONExtractInt(_airbyte_data, '买家实际支付金额') AS actual_payment_amount
75+
FROM default_raw__stream_tmall_order_sample;
76+
77+
```
78+
79+
确认数据处理完成
80+
```sql
81+
select * from airbyte_internal.ods_tmall_order limit 10;
82+
```
83+
# 4. 数据展示
84+
在 Superset 中添加 clickhouse 数据源, 并制作面板。关于如何添加数据源,如何制作面板请参考 Superset quick start。下面我们通过面板的导入功能完成数据源,面板的导入。
85+
1. [下载面板](https://gitee.com/xing-can/pic/blob/master/dashboard_export_20240521T102107.zip)
86+
2. 导入面板
87+
选择下载的文件导入
88+
![](../../images/superset01.png)
89+
输入clickhouse的用户`default`的默认密码`ckdba.123`
90+
![](../../images/superset02.png)
91+
导入后的效果如下
92+
![](../../images/superset03.png)
93+

docs/images/airbyte01.png

102 KB
Loading

docs/images/airbyte02.png

91.7 KB
Loading

docs/images/airbyte03.png

82.1 KB
Loading

docs/images/airbyte04.png

61.3 KB
Loading

docs/images/superset01.png

42.2 KB
Loading

docs/images/superset02.png

46.3 KB
Loading

docs/images/superset03.png

162 KB
Loading
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# 使用 airbyte, clickhouse, superset 探索数据
2+
简体中文 | [English](../../en/user-tutorials/exploring-data-using-airbyte-clickhouse-superset.md)
3+
4+
# 1. 介绍
5+
本文将展示如何使用KDP平台来完成数据集成/加工/BI展示,涉及的应用有`airbyte`, `clickhouse`, `superset`, 需要提前安装这些应用。
6+
建议先熟悉个组件的quick start,然后按照本文的步骤来操作。
7+
8+
# 2. 数据集成
9+
将数据从 csv file 导入 clickhouse
10+
1. 在 airbyte 中添加一个 file 类型 source。
11+
![](../../images/airbyte01.png)
12+
- Dataset Name: `tmall-order-sample` (请误修改)
13+
- URL: `https://gitee.com/xing-can/pic/raw/master/tmall-order-sample.csv`
14+
15+
1. 在 airbyte 中添加一个 clickhouse 类型 destination。
16+
![](../../images/airbyte03.png)
17+
- Host: `clickhouse.kdp-data.svc.cluster.local`
18+
- Port: `8123`
19+
- DB Name: `default`
20+
- User: `default`
21+
- Password: `ckdba.123`
22+
23+
1. 在 airbyte 中添加一个 connection, source 选择 file, destination 选择 clickhouse, 使用默认配置然后保存。
24+
![](../../images/airbyte02.png)
25+
26+
1. 查看 airbyte 的 job 状态,如果成功,则说明数据已经成功导入到clickhouse中。
27+
![](../../images/airbyte04.png)
28+
29+
完成上述操作后即完成了ELT(Extract Load Transform)中的EL, 接下使用clickhouse完成Transform。
30+
31+
# 3. 数据加工
32+
33+
```bash
34+
# 进入clickhouse container
35+
kubectl exec -it clickhouse-shard0-0 -n kdp-data -c clickhouse -- bash
36+
# 连接clickhouse
37+
clickhouse-client --user default --password ckdba.123
38+
# 查看数据库
39+
show databases;
40+
use airbyte_internal;
41+
show tables;
42+
# 确认数据写入成功
43+
select count(*) from airbyte_internal.default_raw__stream_tmall_order_sample;
44+
```
45+
46+
继续依次执行下面的三条SQL语句,完成数据加工。
47+
48+
```sql
49+
DROP TABLE IF EXISTS airbyte_internal.ods_tmall_order;
50+
51+
-- Define the structure of the new table ods_tmall_order
52+
CREATE TABLE airbyte_internal.ods_tmall_order
53+
(
54+
total_amount Int32,
55+
order_number Int32,
56+
shipping_address String,
57+
payment_time DateTime64(3, 'GMT'),
58+
order_creation_time DateTime64(3, 'GMT'),
59+
refund_amount Int32,
60+
actual_payment_amount Int32
61+
)
62+
ENGINE = MergeTree
63+
ORDER BY order_number;
64+
-- Assuming order_number is a unique identifier for each order
65+
66+
-- Insert data into the new table from the JSON in _airbyte_data
67+
INSERT INTO airbyte_internal.ods_tmall_order
68+
SELECT JSONExtractInt(_airbyte_data, '总金额') AS total_amount,
69+
JSONExtractInt(_airbyte_data, '订单编号') AS order_number,
70+
JSONExtractString(_airbyte_data, '收货地址 ') AS shipping_address,
71+
parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单付款时间 '), '')) AS payment_time,
72+
parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单创建时间'), '')) AS order_creation_time,
73+
74+
JSONExtractInt(_airbyte_data, '退款金额') AS refund_amount,
75+
JSONExtractInt(_airbyte_data, '买家实际支付金额') AS actual_payment_amount
76+
FROM default_raw__stream_tmall_order_sample;
77+
78+
```
79+
80+
确认数据处理完成
81+
```sql
82+
select * from airbyte_internal.ods_tmall_order limit 10;
83+
```
84+
# 4. 数据展示
85+
在 Superset 中添加 clickhouse 数据源, 并制作面板。关于如何添加数据源,如何制作面板请参考 Superset quick start。下面我们通过面板的导入功能完成数据源,面板的导入。
86+
1. [下载面板](https://gitee.com/xing-can/pic/blob/master/dashboard_export_20240521T102107.zip)
87+
2. 导入面板
88+
选择下载的文件导入
89+
![](../../images/superset01.png)
90+
输入clickhouse的用户`default`的默认密码`ckdba.123`
91+
![](../../images/superset02.png)
92+
导入后的效果如下
93+
![](../../images/superset03.png)
94+

0 commit comments

Comments
 (0)