Skip to content

Commit a464f87

Browse files
authored
Feat: add airbyte app (#71)
Signed-off-by: xingcan-ltc <[email protected]>
1 parent 9a3b9e1 commit a464f87

File tree

8 files changed

+360
-0
lines changed

8 files changed

+360
-0
lines changed

catalog/airbyte/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
### 1. 引言
2+
Airbyte 是一个开源的数据集成平台,提供 Airbyte 开源版和 Airbyte Cloud 解决方案。它使用户能够从多种数据源同步数据到目的地,便于数据的整合,以供分析和商业智能使用。
3+
4+
### 2. 产品特性
5+
#### 2.1 广泛的连接器目录
6+
拥有全面的连接器列表,适用于各种数据源(数据库、API、SaaS 应用程序等)。300 多个预构建的连接器是业界最大的,并且每年都在翻倍增长。
7+
8+
#### 2.2 易于使用
9+
用户友好的界面使用户能够轻松设计和维护数据管道,可通过 UI、API 和 Terraform 访问。平台提供无忧的配置选项和简化的数据连接器设置。
10+
11+
#### 2.3 完全托管和可扩展
12+
完全托管并针对云的可扩展性进行了优化,能够适应特定任务不断增长的数据需求。能够高效处理大量数据,并在不遇到问题或失败的情况下缓解数据偏斜。
13+
14+
#### 2.4 开源社区
15+
Airbyte 开源社区为用户提供了一个透明且易于访问的平台,鼓励知识交流、故障排除,并为全球数据集成爱好者创建了一个支持性的生态系统。
16+
17+
#### 2.5 监控和管理
18+
平台配备了内置的监控工具,允许用户通过电子邮件、Web Hooks 以及各种目标系统无缝跟踪数据管道性能。
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
### 1. 简介
2+
Airbyte 是一个开源的数据移动基础设施,用于构建提取和加载(EL)数据管道。它被设计为多功能、可扩展且易于使用。
3+
4+
### 2. 核心概念
5+
****
6+
源是一个API、文件、数据库或数据仓库,您希望从中摄取数据。
7+
8+
**目的地**
9+
目的地是一个数据仓库、数据湖、数据库或分析工具,您希望将摄取的数据加载到此处。
10+
11+
**连接器**
12+
连接器是Airbyte的一个组件,用于从源拉取数据或将数据推送到目的地。
13+
14+
**连接**
15+
连接是一个自动化的数据管道,用于将数据从源复制到目的地。
16+
17+
更多详情,请参考官方文档:https://docs.airbyte.com/using-airbyte/core-concepts/
18+
19+
### 2. 快速开始
20+
21+
安装Airbyte后,您可以打开Airbyte入口(例如 http://airbyte-kdp-data.kdp-e2e.io)。然后输入任何电子邮件和组织名称。之后,您可以创建一个连接。
22+
23+
#### 2.1 添加源
24+
25+
添加一个“Faker”源。
26+
27+
点击“Sources”,搜索“faker”然后选择“Faker”,点击“Set up source”。
28+
Airbyte将测试源并显示连接状态。(Airbyte将启动一个Pod来测试连接,这可能需要几分钟时间)
29+
30+
更多详情,请参考官方文档:https://docs.airbyte.com/using-airbyte/getting-started/add-a-source
31+
32+
#### 2.2 添加目的地
33+
34+
添加一个“S3”目的地。在添加目的地之前,您需要创建一个Minio桶:
35+
```bash
36+
kubectl exec -it airbyte-minio-0 -n kdp-data -- bash
37+
mc alias set local http://localhost:9000 minio minio123
38+
mc mb local/tmp
39+
```
40+
41+
点击“Destinations”,搜索“S3”然后选择它,输入以下字段:
42+
43+
- S3 Key ID: `minio`
44+
- S3 Access Key: `minio123`
45+
- S3 Bucket Name: `tmp`
46+
- S3 Bucket Path: `/`
47+
- S3 Bucket Region: 选择任何区域
48+
49+
可选字段:
50+
51+
- S3 Endpoint: http://airbyte-minio-svc:9000
52+
53+
点击“Set up destination”。
54+
55+
2.3 创建连接
56+
点击“Connections”,点击“Create connection”,选择您刚刚创建的源和目的地。
57+
58+
- Define source: `faker`
59+
- Define destination: `S3`
60+
- Select the stream: click `Next` button
61+
- Conifgure connection: click `Finish & Sync` button
62+
63+
64+
同步任务将被触发,您可以在UI中看到同步状态。(下载image和启动Pod可能需要几分钟时间。)
65+
66+
如果同步成功,您可以在Minio桶中看到数据。
67+
68+
```bash
69+
kubectl exec -it airbyte-minio-0 -n kdp-data -- bash
70+
# list the bucket
71+
mc ls -r local/tmp
72+
# you can delete the bucket if you want
73+
mc rm -r local/tmp
74+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: bdc.kdp.io/v1alpha1
2+
kind: Application
3+
metadata:
4+
annotations:
5+
app.core.bdos/catalog: airbyte
6+
labels:
7+
app: airbyte
8+
app.core.bdos/type: system
9+
spec:
10+
name: airbyte
11+
type: airbyte
12+
properties:
13+
server:
14+
replicaCount: 1
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
### 1. Introduction
2+
Airbyte is an open-source data movement infrastructure for building extract and load (EL) data pipelines. It is designed for versatility, scalability, and ease-of-use.
3+
4+
5+
### 2. Core Concepts
6+
**Source**
7+
A source is an API, file, database, or data warehouse that you want to ingest data from.
8+
9+
**Destination**
10+
A destination is a data warehouse, data lake, database, or an analytics tool where you want to load your ingested data.
11+
12+
**Connector**
13+
An Airbyte component which pulls data from a source or pushes data to a destination.
14+
15+
**Connection**
16+
A connection is an automated data pipeline that replicates data from a source to a destination.
17+
18+
For more details, refer to the official documentation: https://docs.airbyte.com/using-airbyte/core-concepts/
19+
20+
21+
### 2. Quick Start
22+
23+
After installing Airbyte, you can open the Airbyte ingress (like http://airbyte-kdp-data.kdp-e2e.io). Then input any email and organization name. After that, you can create a connection.
24+
25+
#### 2.1 Add a Source
26+
27+
Add a "Faker" source.
28+
29+
Click "Sources", search for "faker" then select "Faker", click "Set up source".
30+
Airbyte will test the source and show the connection status. (Airbyte will launch a Pod to test the connection that mqy take a few minutes)
31+
32+
For more details, refer to the official documentation: https://docs.airbyte.com/using-airbyte/getting-started/add-a-source
33+
34+
#### 2.2 Add a Destination
35+
36+
Add a "S3" destination. Before adding a destination, you need to create a Minio bucket:
37+
```bash
38+
kubectl exec -it airbyte-minio-0 -n kdp-data -- bash
39+
mc alias set local http://localhost:9000 minio minio123
40+
mc mb local/tmp
41+
```
42+
43+
Click "Destinations", search for "S3" then select it, input the following fields:
44+
45+
- S3 Key ID: `minio`
46+
- S3 Access Key: `minio123`
47+
- S3 Bucket Name: `tmp`
48+
- S3 Bucket Path: `/`
49+
- S3 Bucket Region: select any region
50+
51+
Optional fields:
52+
- S3 Endpoint: `http://airbyte-minio-svc:9000`
53+
54+
Click "Set up destination".
55+
56+
#### 2.3 Create a Connection
57+
Click "Connections", click "Create connection", select the source and destination you just created.
58+
- Define source: `faker`
59+
- Define destination: `S3`
60+
- Select the stream: click `Next` button
61+
- Conifgure connection: click `Finish & Sync` button
62+
63+
Sync task will be triggered,you can see the sync status in the UI. (It will take a few minutes to download images and to launch pods.)
64+
65+
66+
If sync is successful, you can see the data in the Minio bucket.
67+
68+
```bash
69+
kubectl exec -it airbyte-minio-0 -n kdp-data -- bash
70+
# list the bucket
71+
mc ls -r local/tmp
72+
# you can delete the bucket if you want
73+
mc rm -r local/tmp
74+
```
75+
76+
77+
78+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
version: 0.59.1
2+
alias: Airbyte
3+
description: Airbyte 是一个开源的数据移动基础设施,专注于构建提取和加载(EL)数据管道。它被设计为多功能、可扩展且易于使用。 https://artifacthub.io/packages/helm/airbyte/airbyte/0.67.17
4+
isGlobal: false
5+
i18n:
6+
en:
7+
description: >-
8+
Airbyte is an open-source data movement infrastructure for building extract and load (EL) data pipelines. It is designed for versatility, scalability, and ease-of-use. https://artifacthub.io/packages/helm/airbyte/airbyte/0.67.17

catalog/airbyte/i18n/en/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
### 1. Introduction
2+
Airbyte is an open-source data integration platform that offers both Airbyte Open Source and Airbyte Cloud solutions. It empowers users to synchronize data from various sources to destinations, facilitating the consolidation of data for analysis and business intelligence.
3+
4+
### 2. Product Features
5+
#### 2.1 Extensive Connector Catalog
6+
Comprehensive list of connectors for various data sources (databases, APIs, SaaS applications, etc). 300+ pre-built connectors is the largest in the industry and is doubling every year for various data sources.
7+
8+
#### 2.2 Ease of Use
9+
A user-friendly interface empowers users to design and manage data pipelines effortlessly, accessible via UI, API, and Terraform. The platform offers hassle-free configuration options and simplified setup for data connectors.
10+
11+
#### 2.3 Fully Managed and Scalable
12+
Fully managed and optimized for cloud scalability, adept at accommodating the increasing data demands of specific tasks. Capable of efficiently handling large data volumes and mitigating data skewness without encountering issues or failures.
13+
14+
#### 2.4 Open Source Community
15+
The Airbyte open-source community provides users with a transparent and accessible platform, encouraging knowledge exchange, troubleshooting, and the creation of a supportive ecosystem for data integration enthusiasts worldwide.
16+
17+
#### 2.5 Monitoring and Management
18+
The platform comes equipped with built-in monitoring tools, allowing users to track data pipeline performance seamlessly via email, web hooks, and various target systems.
19+
20+

catalog/airbyte/metadata.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
name: Airbyte
2+
category: 开发工具
3+
description: Airbyte 是一个开源的数据移动基础设施,专注于构建提取和加载(EL)数据管道。它被设计为多功能、可扩展且易于使用。 https://artifacthub.io/packages/helm/airbyte/airbyte/0.67.17
4+
i18n:
5+
en:
6+
category: devtools
7+
description: >-
8+
Airbyte is an open-source data movement infrastructure for building extract and load (EL) data pipelines. It is designed for versatility, scalability, and ease-of-use. https://artifacthub.io/packages/helm/airbyte/airbyte/0.67.17
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
"airbyte": {
2+
annotations: {}
3+
labels: {}
4+
attributes: {
5+
apiResource: {
6+
definition: {
7+
apiVersion: "bdc.kdp.io/v1alpha1"
8+
kind: "Application"
9+
type: "airbyte"
10+
}
11+
}
12+
}
13+
description: "airbyte"
14+
type: "xdefinition"
15+
}
16+
17+
template: {
18+
output: {
19+
apiVersion: "core.oam.dev/v1beta1"
20+
kind: "Application"
21+
metadata: {
22+
name: context["name"]
23+
namespace: context["namespace"]
24+
}
25+
spec: {
26+
components: [
27+
{
28+
name: context["name"]
29+
properties: {
30+
chart: "airbyte"
31+
releaseName: context["name"]
32+
repoType: "oci"
33+
targetNamespace: context["namespace"]
34+
url: context["helm_repo_url"]
35+
values: {
36+
webapp: {
37+
replicaCount: 1
38+
}
39+
server: {
40+
replicaCount: 1
41+
}
42+
worker: {
43+
replicaCount: parameter.worker.replicaCount
44+
}
45+
"airbyte-api-server": {
46+
replicaCount: 1
47+
}
48+
postgresql: {
49+
enabled: true
50+
}
51+
minio: {
52+
storage: {
53+
volumeClaimValue: parameter.minio.storage.size
54+
}
55+
}
56+
postgresql: {
57+
enabled: true
58+
}
59+
}
60+
version: "0.67.17"
61+
}
62+
traits: [
63+
{
64+
properties: {
65+
rules: [
66+
{
67+
host: context["name"] + "-" + context["namespace"] + "." + context["ingress.root_domain"]
68+
paths: [
69+
{
70+
path: "/"
71+
serviceName: "airbyte-airbyte-webapp-svc"
72+
servicePort: 80
73+
},
74+
]
75+
},
76+
77+
]
78+
tls: [
79+
{
80+
hosts: [
81+
context["name"] + "-" + context["namespace"] + "." + context["ingress.root_domain"],
82+
]
83+
tlsSecretName: context["ingress.tls_secret_name"]
84+
},
85+
]
86+
}
87+
type: "bdos-ingress"
88+
},
89+
90+
]
91+
type: "helm"
92+
},
93+
94+
]
95+
96+
}
97+
}
98+
99+
parameter: {
100+
// +ui:description=Minio存储配置
101+
// +ui:order=1
102+
minio: {
103+
// +ui:order=1
104+
storage: {
105+
// +ui:description=配置存储大小
106+
// +pattern=^([1-9]\d*)(Ti|Gi|Mi)$
107+
// +err:options={"pattern":"请输入正确格式,如1024Mi, 1Gi, 1Ti"}
108+
size: *"500Mi" | string
109+
}
110+
}
111+
112+
// +ui:description=Worker配置
113+
// +ui:order=2
114+
worker: {
115+
// +minimum=1
116+
// +ui:description=副本数
117+
replicaCount: *1 | int
118+
}
119+
120+
// +ui:description=Webapp配置
121+
// +ui:order=3
122+
webapp: {
123+
// +minimum=1
124+
// +ui:description=副本数
125+
replicaCount: *1 | int
126+
}
127+
128+
// +ui:description=Server配置
129+
// +ui:order=4
130+
server: {
131+
// +minimum=1
132+
// +ui:description=副本数
133+
replicaCount: *1 | int
134+
}
135+
136+
// +ui:description= 配置环境变量,如 MAX_SYNC_WORKERS=5
137+
// +ui:order=5
138+
env_vars?: {...}
139+
}
140+
}

0 commit comments

Comments
 (0)