|
| 1 | +# Exploring data using airbyte, clickhouse, and superset |
| 2 | +English | [简体中文](../../zh/user-tutorials/exploring-data-using-airbyte-clickhouse-superset.md) |
| 3 | + |
| 4 | +# 1. Introduction |
| 5 | +This guide will show you how to use the KDP platform to complete data integration/processing/BI display, involving applications such as `airbyte`, `clickhouse`, and `superset`. It is recommended to familiarize yourself with each component's quick start before following the steps in this guide. |
| 6 | + |
| 7 | +# 2. Data Integration |
| 8 | +Import data from a csv file into clickhouse |
| 9 | +1. Add a file type source in airbyte. |
| 10 | +  |
| 11 | + - Dataset Name: `tmall-order-sample` (please modify) |
| 12 | + - URL: `https://gitee.com/xing-can/pic/raw/master/tmall-order-sample.csv` |
| 13 | + |
| 14 | +1. Add a clickhouse type destination in airbyte. |
| 15 | +  |
| 16 | + - Host: `clickhouse.kdp-data.svc.cluster.local` |
| 17 | + - Port: `8123` |
| 18 | + - DB Name: `default` |
| 19 | + - User: `default` |
| 20 | + - Password: `ckdba.123` |
| 21 | + |
| 22 | +1. Add a connection in airbyte, select file as the source and clickhouse as the destination, use the default configuration, and then save. |
| 23 | +  |
| 24 | + |
| 25 | +1. Check the airbyte job status. If successful, the data has been successfully imported into clickhouse. |
| 26 | +  |
| 27 | + |
| 28 | +After completing the above operations, the ELT (Extract Load Transform) process is completed, and the Transform is performed using clickhouse. |
| 29 | + |
| 30 | +# 3. 数据加工 |
| 31 | + |
| 32 | +```bash |
| 33 | +# Enter clickhouse container |
| 34 | +kubectl exec -it clickhouse-shard0-0 -n kdp-data -c clickhouse -- bash |
| 35 | +# Connect to clickhouse |
| 36 | +clickhouse-client --user default --password ckdba.123 |
| 37 | +# View the databases |
| 38 | +show databases; |
| 39 | +use airbyte_internal; |
| 40 | +show tables; |
| 41 | +# Confirm that the data has been successfully written |
| 42 | +select count(*) from airbyte_internal.default_raw__stream_tmall_order_sample; |
| 43 | +``` |
| 44 | +
|
| 45 | +继续依次执行下面的三条SQL语句,完成数据加工。 |
| 46 | +
|
| 47 | +```sql |
| 48 | +DROP TABLE IF EXISTS airbyte_internal.ods_tmall_order; |
| 49 | + |
| 50 | +-- Define the structure of the new table ods_tmall_order |
| 51 | +CREATE TABLE airbyte_internal.ods_tmall_order |
| 52 | +( |
| 53 | + total_amount Int32, |
| 54 | + order_number Int32, |
| 55 | + shipping_address String, |
| 56 | + payment_time DateTime64(3, 'GMT'), |
| 57 | + order_creation_time DateTime64(3, 'GMT'), |
| 58 | + refund_amount Int32, |
| 59 | + actual_payment_amount Int32 |
| 60 | +) |
| 61 | + ENGINE = MergeTree |
| 62 | +ORDER BY order_number; |
| 63 | +-- Assuming order_number is a unique identifier for each order |
| 64 | + |
| 65 | +-- Insert data into the new table from the JSON in _airbyte_data |
| 66 | +INSERT INTO airbyte_internal.ods_tmall_order |
| 67 | +SELECT JSONExtractInt(_airbyte_data, '总金额') AS total_amount, |
| 68 | + JSONExtractInt(_airbyte_data, '订单编号') AS order_number, |
| 69 | + JSONExtractString(_airbyte_data, '收货地址 ') AS shipping_address, |
| 70 | + parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单付款时间 '), '')) AS payment_time, |
| 71 | + parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单创建时间'), '')) AS order_creation_time, |
| 72 | + |
| 73 | + JSONExtractInt(_airbyte_data, '退款金额') AS refund_amount, |
| 74 | + JSONExtractInt(_airbyte_data, '买家实际支付金额') AS actual_payment_amount |
| 75 | +FROM default_raw__stream_tmall_order_sample; |
| 76 | + |
| 77 | +``` |
| 78 | +
|
| 79 | +确认数据处理完成 |
| 80 | +```sql |
| 81 | +select * from airbyte_internal.ods_tmall_order limit 10; |
| 82 | +``` |
| 83 | +# 4. 数据展示 |
| 84 | +在 Superset 中添加 clickhouse 数据源, 并制作面板。关于如何添加数据源,如何制作面板请参考 Superset quick start。下面我们通过面板的导入功能完成数据源,面板的导入。 |
| 85 | +1. [下载面板](https://gitee.com/xing-can/pic/blob/master/dashboard_export_20240521T102107.zip) |
| 86 | +2. 导入面板 |
| 87 | +选择下载的文件导入 |
| 88 | + |
| 89 | +输入clickhouse的用户`default`的默认密码`ckdba.123` |
| 90 | + |
| 91 | +导入后的效果如下 |
| 92 | + |
| 93 | + |
0 commit comments