|
| 1 | +# Exploring data using airbyte, clickhouse, and superset |
| 2 | +English | [简体中文](../../zh/user-tutorials/exploring-data-using-airbyte-clickhouse-superset.md) |
| 3 | + |
| 4 | +# 1. Introduction |
| 5 | +This guide will show you how to use the KDP platform to complete data integration/processing/BI display, involving applications such as `airbyte`, `clickhouse`, and `superset`. It is recommended to familiarize yourself with each component's quick start before following the steps in this guide. |
| 6 | + |
| 7 | +# 2. Data Integration |
| 8 | +Import data from a csv file into clickhouse |
| 9 | +1. Add a file type source in airbyte. |
| 10 | +  |
| 11 | + - Dataset Name: `tmall-order-sample` (please modify) |
| 12 | + - URL: `https://gitee.com/linktime-cloud/example-datasets/raw/main/airbyte/tmall-order-sample.csv` |
| 13 | + |
| 14 | +1. Add a clickhouse type destination in airbyte. |
| 15 | +  |
| 16 | + - Host: `clickhouse.kdp-data.svc.cluster.local` |
| 17 | + - Port: `8123` |
| 18 | + - DB Name: `default` |
| 19 | + - User: `default` |
| 20 | + - Password: `ckdba.123` |
| 21 | + |
| 22 | +1. Add a connection in airbyte, select file as the source and clickhouse as the destination, use the default configuration, and then save. |
| 23 | +  |
| 24 | + |
| 25 | +1. Check the airbyte job status. If successful, the data has been successfully imported into clickhouse. |
| 26 | +  |
| 27 | + |
| 28 | +After completing the above operations, the ELT (Extract Load Transform) process is completed, and the Transform is performed using clickhouse. |
| 29 | + |
| 30 | +# 3. Data Processing |
| 31 | + |
| 32 | +```bash |
| 33 | +# Enter clickhouse container |
| 34 | +kubectl exec -it clickhouse-shard0-0 -n kdp-data -c clickhouse -- bash |
| 35 | +# Connect to clickhouse |
| 36 | +clickhouse-client --user default --password ckdba.123 |
| 37 | +# View the databases |
| 38 | +show databases; |
| 39 | +use airbyte_internal; |
| 40 | +show tables; |
| 41 | +# Confirm that the data has been successfully written |
| 42 | +select count(*) from airbyte_internal.default_raw__stream_tmall_order_sample; |
| 43 | +``` |
| 44 | +
|
| 45 | +Continue to execute the following three SQL statements in sequence to complete data processing. |
| 46 | +
|
| 47 | +```sql |
| 48 | +DROP TABLE IF EXISTS airbyte_internal.ods_tmall_order; |
| 49 | + |
| 50 | +-- Define the structure of the new table ods_tmall_order |
| 51 | +CREATE TABLE airbyte_internal.ods_tmall_order |
| 52 | +( |
| 53 | + total_amount Int32, |
| 54 | + order_number Int32, |
| 55 | + shipping_address String, |
| 56 | + payment_time DateTime64(3, 'GMT'), |
| 57 | + order_creation_time DateTime64(3, 'GMT'), |
| 58 | + refund_amount Int32, |
| 59 | + actual_payment_amount Int32 |
| 60 | +) |
| 61 | + ENGINE = MergeTree |
| 62 | +ORDER BY order_number; |
| 63 | +-- Assuming order_number is a unique identifier for each order |
| 64 | + |
| 65 | +-- Insert data into the new table from the JSON in _airbyte_data |
| 66 | +INSERT INTO airbyte_internal.ods_tmall_order |
| 67 | +SELECT JSONExtractInt(_airbyte_data, '总金额') AS total_amount, |
| 68 | + JSONExtractInt(_airbyte_data, '订单编号') AS order_number, |
| 69 | + JSONExtractString(_airbyte_data, '收货地址 ') AS shipping_address, |
| 70 | + parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单付款时间 '), '')) AS payment_time, |
| 71 | + parseDateTimeBestEffortOrNull(nullIf(JSONExtractString(_airbyte_data, '订单创建时间'), '')) AS order_creation_time, |
| 72 | + |
| 73 | + JSONExtractInt(_airbyte_data, '退款金额') AS refund_amount, |
| 74 | + JSONExtractInt(_airbyte_data, '买家实际支付金额') AS actual_payment_amount |
| 75 | +FROM default_raw__stream_tmall_order_sample; |
| 76 | + |
| 77 | +``` |
| 78 | +
|
| 79 | +Data processing verification |
| 80 | +```sql |
| 81 | +select * from airbyte_internal.ods_tmall_order limit 10; |
| 82 | +``` |
| 83 | +# 4. Data Display |
| 84 | +In Superset, we add a ClickHouse data source and create a dashboard. For instructions on how to add a data source and create a dashboard, refer to the Superset quick start guide. Below, we complete the data source and dashboard import using the panel import feature. |
| 85 | +1. [Download the dashboard](https://gitee.com/linktime-cloud/example-datasets/blob/main/superset/dashboard_export_20240521T102107.zip) |
| 86 | +2. Import the dashboard |
| 87 | + - Select the downloaded file to import |
| 88 | + |
| 89 | + - Enter the default password of the ClickHouse user `default` as ckdba.123` |
| 90 | + |
| 91 | + - The imported dashboard will look like this: |
| 92 | + |
| 93 | + |
0 commit comments