|
1 |
| -# Databend AI Capabilities |
| 1 | +# Databend AI and ML |
2 | 2 |
|
3 |
| -This guide introduces Databend's built-in AI functions that enable natural language processing tasks through SQL queries, including text understanding, generation, and more. |
| 3 | +Databend offers two approaches for AI and ML integration: |
4 | 4 |
|
5 |
| -:::warning |
6 |
| -Data Privacy and Security |
7 |
| - |
8 |
| -Databend uses Azure OpenAI Service for embeddings and text completions. Your data will be sent to Azure OpenAI when using these functions. These features are available by default on Databend Cloud. |
9 |
| - |
10 |
| -**By using these functions, you acknowledge that your data will be sent to Azure OpenAI Service** and agree to the [Azure OpenAI Data Privacy](https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy) terms. |
11 |
| -::: |
12 |
| - |
13 |
| -## Key AI Functions |
14 |
| - |
15 |
| -| Function | Description | When to Use | |
16 |
| -|----------|-------------|------------| |
17 |
| -| [ai_text_completion](/sql/sql-functions/ai-functions/ai-text-completion) | Generates text based on a prompt | • Content generation<br/>• Question answering<br/>• Summarization<br/>• Text expansion | |
18 |
| -| [ai_embedding_vector](/sql/sql-functions/ai-functions/ai-embedding-vector) | Converts text into vector representations | • Semantic search<br/>• Document similarity<br/>• Content recommendation<br/>• Text classification | |
19 |
| -| [cosine_distance](/sql/sql-functions/vector-distance-functions/vector-cosine-distance) | Calculates similarity between vectors | • Finding similar documents<br/>• Ranking search results<br/>• Measuring text similarity | |
20 |
| - |
21 |
| - |
22 |
| - |
23 |
| -## What are Embeddings? |
24 |
| - |
25 |
| -Embeddings are vector representations of text that capture semantic meaning. Similar texts have closer vectors in the embedding space, enabling comparison and analysis for tasks like document similarity and clustering. |
26 |
| - |
27 |
| -## Vector Storage in Databend |
28 |
| - |
29 |
| -Databend can store embedding vectors using the `ARRAY(FLOAT NOT NULL)` data type and perform similarity calculations with the cosine_distance function directly in SQL. |
30 |
| - |
31 |
| -## Example: Document Similarity Search |
32 |
| - |
33 |
| -```sql |
34 |
| --- Create a table for documents |
35 |
| -CREATE TABLE articles ( |
36 |
| - id INT, |
37 |
| - title VARCHAR, |
38 |
| - content VARCHAR, |
39 |
| - embedding ARRAY(FLOAT NOT NULL) |
40 |
| -); |
41 |
| - |
42 |
| --- Insert documents with embeddings |
43 |
| -INSERT INTO articles (id, title, content, embedding) |
44 |
| -VALUES |
45 |
| - (1, 'Python for Data Science', 'Python is a versatile programming language...', |
46 |
| - ai_embedding_vector('Python is a versatile programming language...')), |
47 |
| - (2, 'Introduction to R', 'R is a popular programming language for statistics...', |
48 |
| - ai_embedding_vector('R is a popular programming language for statistics...')); |
49 |
| - |
50 |
| --- Find similar documents to a query |
51 |
| -SELECT |
52 |
| - id, title, content, |
53 |
| - cosine_distance(embedding, ai_embedding_vector('How to use Python in data analysis?')) AS similarity |
54 |
| -FROM articles |
55 |
| -ORDER BY similarity ASC |
56 |
| -LIMIT 3; |
57 |
| -``` |
58 |
| - |
59 |
| -## Example: Text Completion |
60 |
| - |
61 |
| -```sql |
62 |
| --- Generate a completion for a prompt |
63 |
| -SELECT ai_text_completion('Explain the benefits of cloud data warehouses in three points:') AS completion; |
64 |
| - |
65 |
| --- Result might be: |
66 |
| --- 1. Scalability: Cloud data warehouses can easily scale up or down based on demand, |
67 |
| --- eliminating the need for upfront capacity planning. |
68 |
| --- 2. Cost-efficiency: Pay-as-you-go pricing models reduce capital expenditure and |
69 |
| --- allow businesses to pay only for the resources they use. |
70 |
| --- 3. Accessibility: Cloud data warehouses enable teams to access data from anywhere, |
71 |
| --- facilitating remote work and global collaboration. |
72 |
| -``` |
73 |
| - |
74 |
| -## Building an AI Q&A System |
75 |
| - |
76 |
| -You can create a simple Q&A system with Databend by: |
77 |
| -1. Storing documents with embeddings |
78 |
| -2. Finding relevant documents for a question |
79 |
| -3. Using text completion to generate answers |
80 |
| - |
81 |
| -Try these AI capabilities on [Databend Cloud](https://databend.com) with a free trial. |
| 5 | +| Approach | Features | Use Cases | |
| 6 | +|----------|----------|-----------| |
| 7 | +| **[External Functions](01-external-functions.md)** ✓ *Recommended* | • Custom models<br/>• GPU deployment<br/>• Custom pipelines<br/>• Data privacy | • Specialized domains<br/>• High performance<br/>• Privacy requirements | |
| 8 | +| **[Built-in Functions](02-built-in-functions.md)** | • Text completion<br/>• Embeddings<br/>• Vector operations<br/>• Zero setup | • Quick prototyping<br/>• General NLP<br/>• Simple implementation | |
0 commit comments