Skip to content

Commit f55ab8d

Browse files
authored
Merge branch 'main' into udpate-php-tutorial
2 parents 4d68cda + 90080da commit f55ab8d

28 files changed

+8206
-23
lines changed

test/test-markdown-frontmatter.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ const sdk_languages = ['nodejs', 'scala', 'python', 'swift', 'csharp', 'objectiv
88

99
const tags = ['Ottoman', 'Ktor', 'REST API', 'Express', 'Flask', 'TLS', 'Configuration', 'Next.js', 'iOS', 'Xcode', '.NET', 'Xamarin', 'Authentication', 'OpenID', 'Keycloak', 'Android', 'P2P', 'UIKit', 'Installation', 'Spring Boot', 'Spring Data', 'Transactions', 'SQL++ (N1QL)', 'Optimization', 'Community Edition', 'Docker', 'Data Modeling', 'Metadata', 'Best Practices', 'Data Ingestion', 'Kafka', 'Support', 'Customer', 'Prometheus', 'Monitoring', 'Observability', 'Metrics', 'Query Workbench', 'ASP.NET', 'linq', 'DBaaS', 'App Services', 'Flutter', 'Gin Gonic', 'FastAPI', 'Laravel', 'LangChain', 'OpenAI', 'Streamlit', 'Google Gemini', 'Nvidia NIM', 'LLama3', 'AWS', 'Artificial Intelligence', 'Cohere', 'Jina AI', 'Mistral AI', 'Ragas', 'Haystack', 'LangGraph', 'Amazon Bedrock', 'CrewAI']
1010

11-
const technologies = ['connectors', 'kv', 'query', 'capella', 'server', 'index', 'mobile', 'fts', 'sync gateway', 'eventing', 'analytics', 'udf']
11+
const technologies = ['connectors', 'kv', 'query', 'capella', 'server', 'index', 'mobile', 'fts', 'sync gateway', 'eventing', 'analytics', 'udf', 'vector search']
1212

1313
const content_types = ['quickstart', 'tutorial', 'learn']
1414

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_AzureOpenAI.md

Lines changed: 630 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Bedrock.md

Lines changed: 602 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Claude(by_Anthropic).md

Lines changed: 658 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Cohere.md

Lines changed: 655 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_CrewAI.md

Lines changed: 1189 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Jina_AI.md

Lines changed: 623 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Voyage.md

Lines changed: 637 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
## ALL FILES IN THIS DIRECTORY ARE GENERATED. TO EDIT, VISIT THE SOURCE REPO, MENTIONED IN THE COMMIT MESSAGE

tutorial/markdown/generated/vector-search-cookbook/couchbase_presistence_langgraph.md

Lines changed: 893 additions & 0 deletions
Large diffs are not rendered by default.

tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md

Lines changed: 468 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
---
2+
# frontmatter
3+
path: "/tutorial-mistralai-couchbase-vector-search"
4+
title: Using Mistral AI Embeddings with Couchbase Vector Search
5+
short_title: Mistral AI with Couchbase Vector Search
6+
description:
7+
- Learn how to generate embeddings using Mistral AI and store them in Couchbase.
8+
- This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings.
9+
- You'll understand how to perform vector search to find relevant documents based on similarity.
10+
content_type: tutorial
11+
filter: sdk
12+
technology:
13+
- vector search
14+
tags:
15+
- Artificial Intelligence
16+
- Mistral AI
17+
sdk_language:
18+
- python
19+
length: 30 Mins
20+
---
21+
22+
23+
<!--- *** WARNING ***: Autogenerated markdown file from jupyter notebook. ***DO NOT EDIT THIS FILE***. Changes should be made to the original notebook file. See commit message for source repo. -->
24+
25+
26+
[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/mistralai.ipynb)
27+
28+
# Introduction
29+
30+
Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises.
31+
32+
Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs.
33+
34+
The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via:
35+
36+
- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time
37+
- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion
38+
- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers
39+
- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools
40+
- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models
41+
- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object
42+
- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models
43+
44+
45+
# How to run this tutorial
46+
47+
This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/mistralai/mistralai.ipynb).
48+
49+
You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment.
50+
51+
# Before you start
52+
53+
## Get Credentials for Mistral AI
54+
55+
Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials.
56+
57+
## Create and Deploy Your Free Tier Operational cluster on Capella
58+
59+
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.
60+
61+
To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).
62+
63+
### Couchbase Capella Configuration
64+
65+
When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.
66+
67+
* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.
68+
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.
69+
70+
# Install necessary libraries
71+
72+
73+
```python
74+
!pip install couchbase mistralai
75+
```
76+
77+
# Imports
78+
79+
80+
```python
81+
from pathlib import Path
82+
from datetime import timedelta
83+
from mistralai import Mistral
84+
from couchbase.auth import PasswordAuthenticator
85+
from couchbase.cluster import Cluster
86+
from couchbase.options import (ClusterOptions, ClusterTimeoutOptions,
87+
QueryOptions)
88+
import couchbase.search as search
89+
from couchbase.options import SearchOptions
90+
from couchbase.vector_search import VectorQuery, VectorSearch
91+
import uuid
92+
```
93+
94+
# Prerequisites
95+
96+
97+
98+
```python
99+
import getpass
100+
couchbase_cluster_url = input("Cluster URL:")
101+
couchbase_username = input("Couchbase username:")
102+
couchbase_password = getpass.getpass("Couchbase password:")
103+
couchbase_bucket = input("Couchbase bucket:")
104+
couchbase_scope = input("Couchbase scope:")
105+
couchbase_collection = input("Couchbase collection:")
106+
```
107+
108+
Cluster URL: localhost
109+
Couchbase username: Administrator
110+
Couchbase password: ········
111+
Couchbase bucket: mistralai
112+
Couchbase scope: _default
113+
Couchbase collection: mistralai
114+
115+
116+
# Couchbase Connection
117+
118+
119+
```python
120+
auth = PasswordAuthenticator(
121+
couchbase_username,
122+
couchbase_password
123+
)
124+
```
125+
126+
127+
```python
128+
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth))
129+
cluster.wait_until_ready(timedelta(seconds=5))
130+
131+
bucket = cluster.bucket(couchbase_bucket)
132+
scope = bucket.scope(couchbase_scope)
133+
collection = scope.collection(couchbase_collection)
134+
```
135+
136+
# Creating Couchbase Vector Search Index
137+
In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html).
138+
139+
140+
```python
141+
search_index_name = couchbase_bucket + "._default.vector_test"
142+
search_index = cluster.search_indexes().get_index(search_index_name)
143+
```
144+
145+
# Mistral Connection
146+
147+
148+
```python
149+
MISTRAL_API_KEY = getpass.getpass("Mistral API Key:")
150+
mistral_client = Mistral(api_key=MISTRAL_API_KEY)
151+
```
152+
153+
# Embedding Documents
154+
Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block:
155+
156+
157+
```python
158+
texts = [
159+
"Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.",
160+
"It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
161+
input("custom embedding text")
162+
]
163+
embeddings = mistral_client.embeddings.create(
164+
model="mistral-embed",
165+
inputs=texts,
166+
)
167+
168+
print("Output embeddings: " + str(len(embeddings.data)))
169+
```
170+
171+
The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information:
172+
173+
```
174+
EmbeddingResponse(
175+
id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[
176+
Data(object='embedding', embedding=[-0.0165863037109375,...], index=0),
177+
Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)],
178+
Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)],
179+
model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15)
180+
)
181+
```
182+
183+
# Storing Embeddings in Couchbase
184+
Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document:
185+
186+
187+
```python
188+
for i in range(0, len(texts)):
189+
doc = {
190+
"id": str(uuid.uuid4()),
191+
"text": texts[i],
192+
"vector": embeddings.data[i].embedding,
193+
}
194+
collection.upsert(doc["id"], doc)
195+
```
196+
197+
# Searching For Embeddings
198+
Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt:
199+
200+
201+
```python
202+
search_embedding = mistral_client.embeddings.create(
203+
model="mistral-embed",
204+
inputs=["name a multipurpose database with distributed capability"],
205+
).data[0]
206+
207+
search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
208+
VectorSearch.from_vector_query(
209+
VectorQuery(
210+
"vector", search_embedding.embedding, num_candidates=1
211+
)
212+
)
213+
)
214+
result = scope.search(
215+
"vector_test",
216+
search_req,
217+
SearchOptions(
218+
limit=13,
219+
fields=["vector", "id", "text"]
220+
)
221+
)
222+
for row in result.rows():
223+
print("Found answer: " + row.id + "; score: " + str(row.score))
224+
doc = collection.get(row.id)
225+
print("Answer text: " + doc.value["text"])
226+
227+
228+
```
229+
230+
Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662
231+
Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.
232+

0 commit comments

Comments
 (0)