Skip to content

[Feature][Engine] Metalake support for data source information storage and management #9687

@wtybxqm

Description

@wtybxqm

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

In Apache SeaTunnel's task configuration, sensitive information such as data source usernames and passwords is directly written into task scripts. This approach presents the following issues:

  • Security Risks: Sensitive information exposed in scripts may lead to data source leaks.
  • Maintenance Challenges: When data source configurations change, manual modifications are required across all related task scripts, resulting in inefficiency and a high risk of errors.

To address these problems, this issue aims to integrate metalake to centralize the storage and management of data source information. By introducing a data source ID mapping mechanism, users can easily update and manage configurations. The goal is to support mainstream data catalogs like Apache Gravitino while providing extensible interfaces for future integration with third-party services (e.g., Unity Catalog or DataHub).

  1. Metalake Configuration Adaptation:

Define metalake comfigurantion in seatunnel-env.sh and load metalake configurations during startup.

  1. Source/Sink Configuration Refactoring:

Add sourceId in source/sink config, and use placeholders in the sensitive info(e.g. username:${username}); dynamically fetch the config info from metalake and replace placeholders.

  1. Plugin-Based Metalake Support:

Define a generic MetalakeClient interface and implement GravitinoClient which fetch data source info by HTTP.

Usage Scenario

No response

Related issues

#8502

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions