Skip to content

Commit

Permalink
[Feature][Connector-V2] Support typesense connector (#7450)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangshenghang authored Aug 29, 2024
1 parent aabfc8e commit 138d2a4
Show file tree
Hide file tree
Showing 59 changed files with 4,025 additions and 8 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/labeler/label-scope-conf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,12 @@ activemq:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-activemq/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(activemq)/**'
typesense:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-typesense/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(typesense)/**'

Zeta Rest API:
- changed-files:
- any-glob-to-any-file: seatunnel-engine/**/server/rest/**
Expand Down
2 changes: 1 addition & 1 deletion config/plugin_config
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,5 @@ connector-web3j
connector-milvus
connector-activemq
connector-sls
connector-typesense
connector-cdc-opengauss
--end--
93 changes: 93 additions & 0 deletions docs/en/connector-v2/sink/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Typesense

## Description

Outputs data to `Typesense`.

## Key Features

- [ ] [Exactly Once](../../concept/connector-v2-features.md)
- [x] [CDC](../../concept/connector-v2-features.md)

## Options

| Name | Type | Required | Default Value |
|------------------|--------|----------|------------------------------|
| hosts | array | Yes | - |
| collection | string | Yes | - |
| schema_save_mode | string | Yes | CREATE_SCHEMA_WHEN_NOT_EXIST |
| data_save_mode | string | Yes | APPEND_DATA |
| primary_keys | array | No | |
| key_delimiter | string | No | `_` |
| api_key | string | No | |
| max_retry_count | int | No | 3 |
| max_batch_size | int | No | 10 |
| common-options | | No | - |

### hosts [array]

The access address for Typesense, formatted as `host:port`, e.g., `["typesense-01:8108"]`.

### collection [string]

The name of the collection to write to, e.g., "seatunnel".

### primary_keys [array]

Primary key fields used to generate the document `id`.

### key_delimiter [string]

Sets the delimiter for composite keys (default is `_`).

### api_key [config]

The `api_key` for secure access to Typesense.

### max_retry_count [int]

The maximum number of retry attempts for batch requests.

### max_batch_size [int]

The maximum size of document batches.

### common options

Common parameters for Sink plugins. Refer to [Common Sink Options](../source-common-options.md) for more details.

### schema_save_mode

Choose how to handle the target-side schema before starting the synchronization task:
- `RECREATE_SCHEMA`: Creates the table if it doesn’t exist, and deletes and recreates it if it does.
- `CREATE_SCHEMA_WHEN_NOT_EXIST`: Creates the table if it doesn’t exist, skips creation if it does.
- `ERROR_WHEN_SCHEMA_NOT_EXIST`: Throws an error if the table doesn’t exist.

### data_save_mode

Choose how to handle existing data on the target side before starting the synchronization task:
- `DROP_DATA`: Retains the database structure but deletes the data.
- `APPEND_DATA`: Retains both the database structure and the data.
- `ERROR_WHEN_DATA_EXISTS`: Throws an error if data exists.

## Example

Simple example:

```bash
sink {
Typesense {
source_table_name = "typesense_test_table"
hosts = ["localhost:8108"]
collection = "typesense_to_typesense_sink_with_query"
max_retry_count = 3
max_batch_size = 10
api_key = "xyz"
primary_keys = ["num_employees","id"]
key_delimiter = "="
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
```

79 changes: 79 additions & 0 deletions docs/en/connector-v2/source/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Typesense

> Typesense Source Connector
## Description

Reads data from Typesense.

## Key Features

- [x] [Batch Processing](../../concept/connector-v2-features.md)
- [ ] [Stream Processing](../../concept/connector-v2-features.md)
- [ ] [Exactly-Once](../../concept/connector-v2-features.md)
- [x] [Schema](../../concept/connector-v2-features.md)
- [x] [Parallelism](../../concept/connector-v2-features.md)
- [ ] [User-Defined Splits Support](../../concept/connector-v2-features.md)

## Options

| Name | Type | Required | Default |
|------------|--------|----------|---------|
| hosts | array | yes | - |
| collection | string | yes | - |
| schema | config | yes | - |
| api_key | string | no | - |
| query | string | no | - |
| batch_size | int | no | 100 |

### hosts [array]

The access address of Typesense, for example: `["typesense-01:8108"]`.

### collection [string]

The name of the collection to write to, for example: `"seatunnel"`.

### schema [config]

The columns to be read from Typesense. For more information, please refer to the [guide](../../concept/schema-feature.md#how-to-declare-type-supported).

### api_key [config]

The `api_key` for Typesense security authentication.

### batch_size

The number of records to query per batch when reading data.

### Common Options

For common parameters of Source plugins, please refer to [Source Common Options](../source-common-options.md).

## Example

```bash
source {
Typesense {
hosts = ["localhost:8108"]
collection = "companies"
api_key = "xyz"
query = "q=*&filter_by=num_employees:>9000"
schema = {
fields {
company_name_list = array<string>
company_name = string
num_employees = long
country = string
id = string
c_row = {
c_int = int
c_string = string
c_array_int = array<int>
}
}
}
}
}
```

95 changes: 95 additions & 0 deletions docs/zh/connector-v2/sink/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Typesense

## 描述

输出数据到 `Typesense`

## 主要特性

- [ ] [精确一次](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## 选项

| 名称 | 类型 | 是否必须 | 默认值 |
|------------------|--------|------|------------------------------|
| hosts | array || - |
| collection | string || - |
| schema_save_mode | string || CREATE_SCHEMA_WHEN_NOT_EXIST |
| data_save_mode | string || APPEND_DATA |
| primary_keys | array || |
| key_delimiter | string || `_` |
| api_key | string || |
| max_retry_count | int || 3 |
| max_batch_size | int || 10 |
| common-options | || - |

### hosts [array]

Typesense的访问地址,格式为 `host:port`,例如:["typesense-01:8108"]

### collection [string]

要写入的集合名,例如:“seatunnel”

### primary_keys [array]

主键字段用于生成文档 `id`

### key_delimiter [string]

设定复合键的分隔符(默认为 `_`)。

### api_key [config]

typesense 安全认证的 api_key。

### max_retry_count [int]

批次批量请求最大尝试大小

### max_batch_size [int]

批次批量文档最大大小

### common options

Sink插件常用参数,请参考 [Sink常用选项](../sink-common-options.md) 了解详情

### schema_save_mode

在启动同步任务之前,针对目标侧已有的表结构选择不同的处理方案<br/>
选项介绍:<br/>
`RECREATE_SCHEMA` :当表不存在时会创建,当表已存在时会删除并重建<br/>
`CREATE_SCHEMA_WHEN_NOT_EXIST` :当表不存在时会创建,当表已存在时则跳过创建<br/>
`ERROR_WHEN_SCHEMA_NOT_EXIST` :当表不存在时将抛出错误<br/>

### data_save_mode

在启动同步任务之前,针对目标侧已存在的数据选择不同的处理方案<br/>
选项介绍:<br/>
`DROP_DATA`: 保留数据库结构,删除数据<br/>
`APPEND_DATA`:保留数据库结构,保留数据<br/>
`ERROR_WHEN_DATA_EXISTS`:当有数据时抛出错误<br/>

## 示例

简单示例

```bash
sink {
Typesense {
source_table_name = "typesense_test_table"
hosts = ["localhost:8108"]
collection = "typesense_to_typesense_sink_with_query"
max_retry_count = 3
max_batch_size = 10
api_key = "xyz"
primary_keys = ["num_employees","id"]
key_delimiter = "="
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
```

79 changes: 79 additions & 0 deletions docs/zh/connector-v2/source/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Typesense

> Typesense 源连接器
## 描述

从 Typesense 读取数据。

## 主要功能

- [x] [批处理](../../concept/connector-v2-features.md)
- [ ] [流处理](../../concept/connector-v2-features.md)
- [ ] [精确一次](../../concept/connector-v2-features.md)
- [x] [Schema](../../concept/connector-v2-features.md)
- [x] [并行度](../../concept/connector-v2-features.md)
- [ ] [支持用户定义的拆分](../../concept/connector-v2-features.md)

## 选项

| 名称 | 类型 | 必填 | 默认值 |
|------------|--------|----|-----|
| hosts | array || - |
| collection | string || - |
| schema | config || - |
| api_key | string || - |
| query | string || - |
| batch_size | int || 100 |

### hosts [array]

Typesense的访问地址,格式为 `host:port`,例如:["typesense-01:8108"]

### collection [string]

要写入的集合名,例如:“seatunnel”

### schema [config]

typesense 需要读取的列。有关更多信息,请参阅:[guide](../../concept/schema-feature.md#how-to-declare-type-supported)

### api_key [config]

typesense 安全认证的 api_key。

### batch_size

读取数据时,每批次查询数量

### 常用选项

Source 插件常用参数,具体请参考 [Source 常用选项](../source-common-options.md)

## 示例

```bash
source {
Typesense {
hosts = ["localhost:8108"]
collection = "companies"
api_key = "xyz"
query = "q=*&filter_by=num_employees:>9000"
schema = {
fields {
company_name_list = array<string>
company_name = string
num_employees = long
country = string
id = string
c_row = {
c_int = int
c_string = string
c_array_int = array<int>
}
}
}
}
}
```

3 changes: 2 additions & 1 deletion plugin-mapping.properties
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,9 @@ seatunnel.source.Milvus = connector-milvus
seatunnel.sink.Milvus = connector-milvus
seatunnel.sink.ActiveMQ = connector-activemq
seatunnel.source.Sls = connector-sls
seatunnel.source.Typesense = connector-typesense
seatunnel.sink.Typesense = connector-typesense
seatunnel.source.Opengauss-CDC = connector-cdc-opengauss

seatunnel.transform.Sql = seatunnel-transforms-v2
seatunnel.transform.FieldMapper = seatunnel-transforms-v2
seatunnel.transform.Filter = seatunnel-transforms-v2
Expand Down
Loading

0 comments on commit 138d2a4

Please sign in to comment.