Skip to content

Commit 73d15e3

Browse files
committed
feat: add load balancing external functions
1 parent 90da3f4 commit 73d15e3

File tree

2 files changed

+112
-39
lines changed

2 files changed

+112
-39
lines changed

docs/en/guides/54-query/03-udf.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
title: User-Defined Function
33
---
4+
45
import IndexOverviewList from '@site/src/components/IndexOverviewList';
56

67
import EEFeature from '@site/src/components/EEFeature';
@@ -87,11 +88,11 @@ AS $$
8788
def remove_stop_words(text, stop_words):
8889
"""
8990
Removes common stop words from the text.
90-
91+
9192
Args:
9293
text (str): The input text.
9394
stop_words (set): A set of stop words to remove.
94-
95+
9596
Returns:
9697
str: Text with stop words removed.
9798
"""
@@ -100,12 +101,12 @@ def remove_stop_words(text, stop_words):
100101
def calculate_sentiment(text, positive_words, negative_words):
101102
"""
102103
Calculates the sentiment score of the text.
103-
104+
104105
Args:
105106
text (str): The input text.
106107
positive_words (set): A set of positive words.
107108
negative_words (set): A set of negative words.
108-
109+
109110
Returns:
110111
int: Sentiment score.
111112
"""
@@ -116,10 +117,10 @@ def calculate_sentiment(text, positive_words, negative_words):
116117
def get_sentiment_label(score):
117118
"""
118119
Determines the sentiment label based on the sentiment score.
119-
120+
120121
Args:
121122
score (int): The sentiment score.
122-
123+
123124
Returns:
124125
str: Sentiment label ('Positive', 'Negative', 'Neutral').
125126
"""
@@ -133,10 +134,10 @@ def get_sentiment_label(score):
133134
def sentiment_analysis_handler(text):
134135
"""
135136
Analyzes the sentiment of the input text.
136-
137+
137138
Args:
138139
text (str): The input text.
139-
140+
140141
Returns:
141142
str: Sentiment analysis result including the score and label.
142143
"""
@@ -147,7 +148,7 @@ def sentiment_analysis_handler(text):
147148
clean_text = remove_stop_words(text, stop_words)
148149
sentiment_score = calculate_sentiment(clean_text, positive_words, negative_words)
149150
sentiment_label = get_sentiment_label(sentiment_score)
150-
151+
151152
return f'Sentiment Score: {sentiment_score}; Sentiment Label: {sentiment_label}'
152153
$$;
153154
```
@@ -161,7 +162,7 @@ CREATE OR REPLACE TABLE texts (
161162

162163
-- Insert sample data
163164
INSERT INTO texts (original_text)
164-
VALUES
165+
VALUES
165166
('The quick brown fox feels happy and joyful'),
166167
('A hard journey, but it was painful and sad'),
167168
('Uncertain outcomes leave everyone unsure and hesitant'),
@@ -193,7 +194,7 @@ A JavaScript UDF allows you to invoke JavaScript code from a SQL query via Datab
193194
The following table shows the type mapping between Databend and JavaScript:
194195

195196
| Databend Type | JS Type |
196-
|-------------------|------------|
197+
| ----------------- | ---------- |
197198
| NULL | null |
198199
| BOOLEAN | Boolean |
199200
| TINYINT | Number |
@@ -266,4 +267,4 @@ ORDER BY 1;
266267

267268
## Managing UDFs
268269

269-
Databend provides a variety of commands to manage UDFs. For details, see [User-Defined Function](/sql/sql-commands/ddl/udf/).
270+
Databend provides a variety of commands to manage UDFs. For details, see [User-Defined Function](/sql/sql-commands/ddl/udf/).

docs/en/guides/54-query/04-external-function.md

Lines changed: 99 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: 'External Functions in Databend Cloud'
3-
sidebar_label: 'External Function'
2+
title: "External Functions in Databend Cloud"
3+
sidebar_label: "External Function"
44
---
55

66
External functions in Databend allow you to define custom operations for processing data using external servers written in programming languages like Python. These functions enable you to extend Databend's capabilities by integrating custom logic, leveraging external libraries, and handling complex processing tasks. Key features of external functions include:
@@ -14,7 +14,7 @@ External functions in Databend allow you to define custom operations for process
1414
The following table lists the supported languages and the required libraries for creating external functions in Databend:
1515

1616
| Language | Required Library |
17-
|----------|-------------------------------------------------------|
17+
| -------- | ----------------------------------------------------- |
1818
| Python | [databend-udf](https://pypi.org/project/databend-udf) |
1919

2020
## Managing External Functions
@@ -69,33 +69,33 @@ if __name__ == '__main__':
6969

7070
**Explanation of `@udf` Decorator Parameters:**
7171

72-
| Parameter | Description |
73-
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
74-
| `input_types` | A list of strings specifying the input data types (e.g., `["INT", "VARCHAR"]`). |
75-
| `result_type` | A string specifying the return value type (e.g., `"INT"`). |
76-
| `name` | (Optional) Custom name for the function. If not provided, the original function name is used. |
77-
| `io_threads` | Number of I/O threads used per data chunk for I/O-bound functions. |
78-
| `skip_null` | If set to `True`, NULL values are not passed to the function, and the corresponding return value is set to NULL. Default is `False`. |
72+
| Parameter | Description |
73+
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
74+
| `input_types` | A list of strings specifying the input data types (e.g., `["INT", "VARCHAR"]`). |
75+
| `result_type` | A string specifying the return value type (e.g., `"INT"`). |
76+
| `name` | (Optional) Custom name for the function. If not provided, the original function name is used. |
77+
| `io_threads` | Number of I/O threads used per data chunk for I/O-bound functions. |
78+
| `skip_null` | If set to `True`, NULL values are not passed to the function, and the corresponding return value is set to NULL. Default is `False`. |
7979

8080
**Data Type Mappings Between Databend and Python:**
8181

82-
| Databend Type | Python Type |
83-
|-----------------------|----------------------|
84-
| BOOLEAN | `bool` |
85-
| TINYINT (UNSIGNED) | `int` |
86-
| SMALLINT (UNSIGNED) | `int` |
87-
| INT (UNSIGNED) | `int` |
88-
| BIGINT (UNSIGNED) | `int` |
89-
| FLOAT | `float` |
90-
| DOUBLE | `float` |
91-
| DECIMAL | `decimal.Decimal` |
92-
| DATE | `datetime.date` |
93-
| TIMESTAMP | `datetime.datetime` |
94-
| VARCHAR | `str` |
95-
| VARIANT | `any` |
96-
| MAP(K,V) | `dict` |
97-
| ARRAY(T) | `list[T]` |
98-
| TUPLE(T,...) | `tuple(T,...)` |
82+
| Databend Type | Python Type |
83+
| ------------------- | ------------------- |
84+
| BOOLEAN | `bool` |
85+
| TINYINT (UNSIGNED) | `int` |
86+
| SMALLINT (UNSIGNED) | `int` |
87+
| INT (UNSIGNED) | `int` |
88+
| BIGINT (UNSIGNED) | `int` |
89+
| FLOAT | `float` |
90+
| DOUBLE | `float` |
91+
| DECIMAL | `decimal.Decimal` |
92+
| DATE | `datetime.date` |
93+
| TIMESTAMP | `datetime.datetime` |
94+
| VARCHAR | `str` |
95+
| VARIANT | `any` |
96+
| MAP(K,V) | `dict` |
97+
| ARRAY(T) | `list[T]` |
98+
| TUPLE(T,...) | `tuple(T,...)` |
9999

100100
### 3. Run the External Server
101101

@@ -141,6 +141,78 @@ You can now use the external function `gcd` in your SQL queries:
141141
SELECT gcd(48, 18); -- Returns 6
142142
```
143143

144+
## Load Balancing External Functions
145+
146+
When deploying multiple external function servers, you can implement load balancing based on function names. Databend includes a `X-DATABEND-FUNCTION` header in each UDF request, which contains the function name being called. This header can be used to route requests to different backend servers.
147+
148+
### Using Nginx for Function-Based Routing
149+
150+
Here's an example of how to configure Nginx to route different UDF requests to specific backend servers:
151+
152+
```nginx
153+
# Define upstream servers for different UDF functions
154+
upstream udf_default {
155+
server 10.0.0.1:8080;
156+
server 10.0.0.2:8080 backup;
157+
}
158+
159+
upstream udf_math_functions {
160+
server 10.0.1.1:8080;
161+
server 10.0.1.2:8080 backup;
162+
}
163+
164+
upstream udf_string_functions {
165+
server 10.0.2.1:8080;
166+
server 10.0.2.2:8080 backup;
167+
}
168+
169+
# Map function names to backend servers
170+
map $http_x_databend_function $udf_backend {
171+
default "udf_default";
172+
"gcd" "udf_math_functions";
173+
"lcm" "udf_math_functions";
174+
"string_reverse" "udf_string_functions";
175+
"string_concat" "udf_string_functions";
176+
}
177+
178+
# Server configuration
179+
server {
180+
listen 443 ssl;
181+
server_name udf.example.com;
182+
183+
# SSL configuration
184+
ssl_certificate /etc/nginx/ssl/udf.example.com.crt;
185+
ssl_certificate_key /etc/nginx/ssl/udf.example.com.key;
186+
187+
# Security headers
188+
add_header Strict-Transport-Security "max-age=31536000" always;
189+
190+
location / {
191+
proxy_pass http://$udf_backend;
192+
proxy_http_version 1.1;
193+
proxy_set_header Host $host;
194+
proxy_set_header X-Real-IP $remote_addr;
195+
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
196+
proxy_set_header X-Forwarded-Proto $scheme;
197+
198+
# Timeouts
199+
proxy_connect_timeout 60s;
200+
proxy_send_timeout 60s;
201+
proxy_read_timeout 60s;
202+
}
203+
}
204+
```
205+
206+
When registering your functions in Databend, use the Nginx server's domain:
207+
208+
```sql
209+
CREATE FUNCTION gcd (INT, INT)
210+
RETURNS INT
211+
LANGUAGE PYTHON
212+
HANDLER = 'gcd'
213+
ADDRESS = 'https://udf.example.com';
214+
```
215+
144216
## Conclusion
145217

146218
External functions in Databend Cloud provide a powerful way to extend the functionality of your data processing pipelines by integrating custom code written in languages like Python. By following the steps outlined above, you can create and use external functions to handle complex processing tasks, leverage external libraries, and implement advanced logic.

0 commit comments

Comments
 (0)