Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dbt-watsonx-presto setup and config files #6736

Merged
merged 37 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
6b11480
Create watsonx-presto-config.md
KNagaVivek Jan 8, 2025
f0a7676
Create watsonx-presto-setup.md
KNagaVivek Jan 8, 2025
cbc72cb
Update sidebars
KNagaVivek Jan 9, 2025
bf2f45e
Merge branch 'current' into current
amychen1776 Jan 9, 2025
c06da2b
Merge branch 'dbt-labs:current' into current
KNagaVivek Jan 11, 2025
e50f831
Update watsonx-presto-setup
KNagaVivek Jan 11, 2025
f4ad42f
Update watsonx-presto-config
KNagaVivek Jan 15, 2025
0bfa824
Merge branch 'current' into current
amychen1776 Jan 16, 2025
206ced6
Remove OSS Presto content
KNagaVivek Jan 21, 2025
6446ed5
Update community-adapters
KNagaVivek Jan 22, 2025
aa735b7
Merge branch 'current' into current
amychen1776 Jan 22, 2025
76b13a9
Fix sentence-style issues
KNagaVivek Jan 23, 2025
af74b0c
Merge branch 'current' into current
nataliefiann Jan 23, 2025
c18fa63
Update website/sidebars.js
mirnawong1 Jan 23, 2025
28c4d79
Merge branch 'current' into current
mirnawong1 Jan 23, 2025
1d20f08
Update watsonx-presto-setup.md
KNagaVivek Jan 23, 2025
0bb9330
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
nataliefiann Jan 23, 2025
94a0b76
Merge branch 'current' into current
nataliefiann Jan 23, 2025
1d7ffcf
Merge branch 'current' into current
mirnawong1 Jan 23, 2025
6ef063e
Merge branch 'current' into current
mirnawong1 Jan 23, 2025
33e039d
Update website/sidebars.js
nataliefiann Jan 23, 2025
2174984
Update website/docs/reference/resource-configs/watsonx-presto-config.md
nataliefiann Jan 23, 2025
334f7aa
Merge branch 'current' into current
amychen1776 Jan 23, 2025
d03fed1
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
c55ec93
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
bab0cff
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
a3eeec2
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
00ef1b0
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
acfd48e
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
7b74d0e
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
f0c17e9
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
amychen1776 Jan 23, 2025
c3884ba
Merge branch 'current' into current
amychen1776 Jan 23, 2025
ddad7be
Update watsonx-presto-setup.md
KNagaVivek Jan 23, 2025
0ed1a5e
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
nataliefiann Jan 23, 2025
271b05f
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
nataliefiann Jan 23, 2025
9ca4aa7
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
nataliefiann Jan 23, 2025
219720a
Update website/docs/docs/core/connect-data-platform/watsonx-presto-se…
nataliefiann Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion website/docs/docs/community-adapters.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ Community adapters are adapter plugins contributed and maintained by members of
| [MySQL](/docs/core/connect-data-platform/mysql-setup) | [RisingWave](/docs/core/connect-data-platform/risingwave-setup) | [Rockset](/docs/core/connect-data-platform/rockset-setup) |
| [SingleStore](/docs/core/connect-data-platform/singlestore-setup)| [SQL Server & Azure SQL](/docs/core/connect-data-platform/mssql-setup) | [SQLite](/docs/core/connect-data-platform/sqlite-setup) |
| [Starrocks](/docs/core/connect-data-platform/starrocks-setup) | [TiDB](/docs/core/connect-data-platform/tidb-setup)| [TimescaleDB](https://dbt-timescaledb.debruyn.dev/) |
| [Upsolver](/docs/core/connect-data-platform/upsolver-setup) | [Vertica](/docs/core/connect-data-platform/vertica-setup) | [Yellowbrick](/docs/core/connect-data-platform/yellowbrick-setup) |
| [Upsolver](/docs/core/connect-data-platform/upsolver-setup) | [Vertica](/docs/core/connect-data-platform/vertica-setup) | [Watsonx-Presto](/docs/core/connect-data-platform/watsonx-presto-setup) |
| [Yellowbrick](/docs/core/connect-data-platform/yellowbrick-setup) |
103 changes: 103 additions & 0 deletions website/docs/docs/core/connect-data-platform/watsonx-presto-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: "IBM watsonx.data Presto setup"
description: "Read this guide to learn about the IBM watsonx.data Presto setup in dbt."
id: "watsonx-presto-setup"
meta:
maintained_by: IBM
authors: Karnati Naga Vivek, Hariharan Ashokan, Biju Palliyath, Gopikrishnan Varadarajulu, Rohan Pednekar
github_repo: 'IBM/dbt-watsonx-presto'
pypi_package: 'dbt-watsonx-presto'
min_core_version: v1.8.0
cloud_support: 'Not Supported'
min_supported_version: 'n/a'
slack_channel_name:
slack_channel_link:
platform_name: IBM watsonx.data
config_page: /reference/resource-configs/watsonx-presto-config
---

The dbt-watsonx-presto adapter allows you to use dbt to transform and manage data on IBM watsonx.data Presto(Java), leveraging its distributed SQL query engine capabilities. Before proceeding, ensure you have the following:
<ul>
<li>An active IBM watsonx.data Presto(Java) engine with connection details (host, port, catalog, schema) in SaaS/Software.</li>
<li>Authentication credentials: Username and password/apikey.</li>
<li>For watsonx.data instances, SSL verification is required for secure connections. If the instance host uses HTTPS, there is no need to specify the SSL certificate parameter. However, if the instance host uses an unsecured HTTP connection, ensure you provide the path to the SSL certificate file.</li>
</ul>
Refer to [Configuring dbt-watsonx-presto](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=presto-configuration-setting-up-your-profile) for guidance on obtaining and organizing these details.


import SetUpPages from '/snippets/_setup-pages-intro.md';

<SetUpPages meta={frontMatter.meta}/>


## Connecting to IBM watsonx.data presto

To connect dbt with watsonx.data Presto(java), you need to configure a profile in your `profiles.yml` file located in the `.dbt/` directory of your home folder. The following is an example configuration for connecting to IBM watsonx.data SaaS and Software instances:
amychen1776 marked this conversation as resolved.
Show resolved Hide resolved

<File name='~/.dbt/profiles.yml'>

```yaml
my_project:
outputs:
software:
type: presto
method: BasicAuth
user: [user]
password: [password]
host: [hostname]
database: [catalog name]
schema: [your dbt schema]
port: [port number]
threads: [1 or more]
ssl_verify: path/to/certificate

saas:
type: presto
method: BasicAuth
user: [user]
password: [api_key]
host: [hostname]
database: [catalog name]
schema: [your dbt schema]
port: [port number]
threads: [1 or more]

target: software

```

</File>

## Host parameters

The following profile fields are required to configure watsonx.data Presto(java) connections. For IBM watsonx.data SaaS or software instances, you can get the `hostname` and `port` details by clicking **View connect details** on the Presto(java) engine details page.
nataliefiann marked this conversation as resolved.
Show resolved Hide resolved

| Option | Required/Optional | Description | Example |
| --------- | ------- | ------- | ----------- |
| `method` | Required | Specifies the authentication method for secure connections. Use `BasicAuth` when connecting to IBM watsonx.data SaaS or Software instances. | `BasicAuth` |
| `user` | Required | Username or email address for authentication. | `user` |
| `password`| Required | Password or API key for authentication | `password` |
| `host` | Required | Hostname for connecting to Presto. | `127.0.0.1` |
| `database`| Required | The catalog name in your Presto instance. | `Analytics` |
| `schema` | Required | The schema name within your Presto instance catalog. | `my_schema` |
| `port` | Required | The port for connecting to Presto. | `443` |
| `ssl_verify` | Optional (default: **true**) | Specifies the path to the SSL certificate or a boolean value. The SSL certificate path is required if the watsonx.data instance is not secure (HTTP).| `path/to/certificate` or `true` |


### Schemas and databases
When selecting the catalog and the schema, make sure the user has read and write access to both. This selection does not limit your ability to query the catalog. Instead, they serve as the default location for where tables and views are materialized. In addition, the Presto connector used in the catalog must support creating tables. This default can be changed later from within your dbt project.

### SSL verification
- If the Presto instance uses an unsecured HTTP connection, you must set `ssl_verify` to the path of the SSL certificate file.
- If the instance uses `HTTPS`, this parameter is not required and can be omitted.

## Additional parameters

The following profile fields are optional to set up. They let you configure your instance session and dbt for your connection.


| Profile field | Description | Example |
| ----------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------ |
| `threads` | How many threads dbt should use (default is `1`) | `8` |
| `http_headers` | HTTP headers to send alongside requests to Presto, specified as a yaml dictionary of (header, value) pairs. | `X-Presto-Routing-Group: my-instance` |
| `http_scheme` | The HTTP scheme to use for requests to (default: `http`, or `https` if `BasicAuth`) | `https` or `http` |
116 changes: 116 additions & 0 deletions website/docs/reference/resource-configs/watsonx-presto-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: "IBM watsonx.data Presto configurations"
id: "watsonx-presto-config"
---

## Instance requirements

To use IBM watsonx.data Presto(java) with `dbt-watsonx-presto` adapter, ensure the instance has an attached catalog that supports creating, renaming, altering, and dropping objects such as tables and views. The user connecting to the instance via the `dbt-watsonx-presto` adapter must have the necessary permissions for the target catalog.

For detailed setup instructions, including setting up watsonx.data, adding the Presto(Java) engine, configuring storages, registering data sources, and managing permissions, refer to the official IBM documentation:
- watsonx.data Software Documentation: [IBM watsonx.data Software Guide](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x)
- watsonx.data SaaS Documentation: [IBM watsonx.data SaaS Guide](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started)



## Session properties

With IBM watsonx.data SaaS/Software instance, you can [set session properties](https://prestodb.io/docs/current/sql/set-session.html) to modify the current configuration for your user session.

To temporarily adjust session properties for a specific dbt model or a group of models, use a [dbt hook](../../reference/resource-configs/pre-hook-post-hook). For example:

```sql
{{
config(
pre_hook="set session query_max_run_time='10m'"
)
}}
```

## Connector properties

IBM watsonx.data SaaS/Software support various connector properties to manage how your data is represented. These properties are particularly useful for file-based connectors like Hive.

For information on what is supported for each data source, refer to the following resources:
- [watsonx.data SaaS Catalog](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-reg_database)
- [watsonx.data Software Catalog](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=components-adding-data-source)


## File format configuration

File-based connectors, such as Hive and Iceberg, allow customization of table materialization, data formats, and partitioning strategies in dbt models. The following examples demonstrate how to configure these options for each connector.

### Hive Configuration

Hive supports specifying file formats and partitioning strategies using the properties parameter in dbt models. The example below demonstrates how to create a partitioned table stored in Parquet format:

```sql
{{
config(
materialized='table',
properties={
"format": "'PARQUET'", -- Specifies the file format
"partitioned_by": "ARRAY['id']", -- Defines the partitioning column(s)
}
)
}}
```

For more details about Hive table creation and supported properties, refer to the [Hive connector documentation](https://prestodb.io/docs/current/connector/hive.html#create-a-managed-table).

### Iceberg Configuration

Iceberg supports defining file formats and advanced partitioning strategies to optimize query performance. The following example demonstrates how to create a ORC table partitioned using a bucketing strategy:

```sql
{{
config(
materialized='table',
properties={
"format": "'ORC'", -- Specifies the file format
"partitioning": "ARRAY['bucket(id, 2)']", -- Defines the partitioning strategy
}
)
}}
```
For more information about Iceberg table creation and supported configurations, refer to the [Iceberg connector documentation](https://prestodb.io/docs/current/connector/iceberg.html#create-table).


## Seeds and prepared statements
The `dbt-watsonx-presto` adapter offers comprehensive support for all [watsonx.data Presto datatypes](https://www.ibm.com/support/pages/node/7157339) in seed files. To leverage this functionality, you must explicitly define the data types for each column.

You can configure column data types either in the dbt_project.yml file or in property files, as supported by dbt. For more details on seed configuration and best practices, refer to the [dbt seed configuration documentation](https://docs.getdbt.com/reference/seed-configs).


## Materializations
amychen1776 marked this conversation as resolved.
Show resolved Hide resolved
The `dbt-watsonx-presto` adapter supports both table and view materializations, allowing you to manage how your data is stored and queried in watsonx.data Presto(java).

For further information on configuring materializations, refer to the [dbt materializations documentation](https://docs.getdbt.com/reference/resource-configs/materialized).

### Table

The `dbt-watsonx-presto` adapter enables you to create and update tables through table materialization, making it easier to work with data in watsonx.data Presto.

#### Recommendations
- **Check Permissions:** Ensure that the necessary permissions for table creation are enabled in the catalog or schema.
- **Check Connector Documentation:** Review watsonx.data Presto [sql statement support](https://www.ibm.com/support/pages/node/7157339) to ensure it supports table creation and modification.

#### Limitations with some connectors
Certain watsonx.data Presto connectors, particularly read-only ones or those with restricted permissions, do not allow creating or modifying tables. If you attempt to use table materialization with these connectors, you may encounter an error like:

```sh
PrestoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="This connector does not support creating tables with data", query_id=20241206_071536_00026_am48r)
```

### View

The `dbt-watsonx-presto` adapter automatically creates views by default, as views are the standard materialization in dbt. If no materialization is explicitly specified, dbt will create a view in watsonx.data Presto.

To confirm whether your connector supports view creation, refer to the watsonx.data [sql statement support](https://www.ibm.com/support/pages/node/7157339).


## Unsupported features
The following features are not supported by the `dbt-watsonx-presto` adapter
- Incremental Materialization
- Materialized Views
- Snapshots
2 changes: 2 additions & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@ const sidebarSettings = {
"docs/core/connect-data-platform/tidb-setup",
"docs/core/connect-data-platform/upsolver-setup",
"docs/core/connect-data-platform/vertica-setup",
"docs/core/connect-data-platform/watsonx-presto-setup",
"docs/core/connect-data-platform/yellowbrick-setup",
],
},
Expand Down Expand Up @@ -899,6 +900,7 @@ const sidebarSettings = {
"reference/resource-configs/teradata-configs",
"reference/resource-configs/upsolver-configs",
"reference/resource-configs/vertica-configs",
"reference/resource-configs/watsonx-presto-config",
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
"reference/resource-configs/yellowbrick-configs",
],
},
Expand Down
Loading