-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into udpate-php-tutorial
- Loading branch information
Showing
28 changed files
with
8,206 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
630 changes: 630 additions & 0 deletions
630
...markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_AzureOpenAI.md
Large diffs are not rendered by default.
Oops, something went wrong.
602 changes: 602 additions & 0 deletions
602
...ial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Bedrock.md
Large diffs are not rendered by default.
Oops, something went wrong.
658 changes: 658 additions & 0 deletions
658
...generated/vector-search-cookbook/RAG_with_Couchbase_and_Claude(by_Anthropic).md
Large diffs are not rendered by default.
Oops, something went wrong.
655 changes: 655 additions & 0 deletions
655
...rial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Cohere.md
Large diffs are not rendered by default.
Oops, something went wrong.
1,189 changes: 1,189 additions & 0 deletions
1,189
...rial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_CrewAI.md
Large diffs are not rendered by default.
Oops, something went wrong.
623 changes: 623 additions & 0 deletions
623
...ial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Jina_AI.md
Large diffs are not rendered by default.
Oops, something went wrong.
637 changes: 637 additions & 0 deletions
637
...rial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Voyage.md
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
## ALL FILES IN THIS DIRECTORY ARE GENERATED. TO EDIT, VISIT THE SOURCE REPO, MENTIONED IN THE COMMIT MESSAGE |
893 changes: 893 additions & 0 deletions
893
...al/markdown/generated/vector-search-cookbook/couchbase_presistence_langgraph.md
Large diffs are not rendered by default.
Oops, something went wrong.
468 changes: 468 additions & 0 deletions
468
tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md
Large diffs are not rendered by default.
Oops, something went wrong.
232 changes: 232 additions & 0 deletions
232
tutorial/markdown/generated/vector-search-cookbook/mistralai.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
--- | ||
# frontmatter | ||
path: "/tutorial-mistralai-couchbase-vector-search" | ||
title: Using Mistral AI Embeddings with Couchbase Vector Search | ||
short_title: Mistral AI with Couchbase Vector Search | ||
description: | ||
- Learn how to generate embeddings using Mistral AI and store them in Couchbase. | ||
- This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings. | ||
- You'll understand how to perform vector search to find relevant documents based on similarity. | ||
content_type: tutorial | ||
filter: sdk | ||
technology: | ||
- vector search | ||
tags: | ||
- Artificial Intelligence | ||
- Mistral AI | ||
sdk_language: | ||
- python | ||
length: 30 Mins | ||
--- | ||
|
||
|
||
<!--- *** WARNING ***: Autogenerated markdown file from jupyter notebook. ***DO NOT EDIT THIS FILE***. Changes should be made to the original notebook file. See commit message for source repo. --> | ||
|
||
|
||
[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/mistralai.ipynb) | ||
|
||
# Introduction | ||
|
||
Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises. | ||
|
||
Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs. | ||
|
||
The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via: | ||
|
||
- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time | ||
- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion | ||
- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers | ||
- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools | ||
- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models | ||
- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object | ||
- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models | ||
|
||
|
||
# How to run this tutorial | ||
|
||
This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/mistralai/mistralai.ipynb). | ||
|
||
You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. | ||
|
||
# Before you start | ||
|
||
## Get Credentials for Mistral AI | ||
|
||
Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials. | ||
|
||
## Create and Deploy Your Free Tier Operational cluster on Capella | ||
|
||
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. | ||
|
||
To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). | ||
|
||
### Couchbase Capella Configuration | ||
|
||
When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. | ||
|
||
* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. | ||
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. | ||
|
||
# Install necessary libraries | ||
|
||
|
||
```python | ||
!pip install couchbase mistralai | ||
``` | ||
|
||
# Imports | ||
|
||
|
||
```python | ||
from pathlib import Path | ||
from datetime import timedelta | ||
from mistralai import Mistral | ||
from couchbase.auth import PasswordAuthenticator | ||
from couchbase.cluster import Cluster | ||
from couchbase.options import (ClusterOptions, ClusterTimeoutOptions, | ||
QueryOptions) | ||
import couchbase.search as search | ||
from couchbase.options import SearchOptions | ||
from couchbase.vector_search import VectorQuery, VectorSearch | ||
import uuid | ||
``` | ||
|
||
# Prerequisites | ||
|
||
|
||
|
||
```python | ||
import getpass | ||
couchbase_cluster_url = input("Cluster URL:") | ||
couchbase_username = input("Couchbase username:") | ||
couchbase_password = getpass.getpass("Couchbase password:") | ||
couchbase_bucket = input("Couchbase bucket:") | ||
couchbase_scope = input("Couchbase scope:") | ||
couchbase_collection = input("Couchbase collection:") | ||
``` | ||
|
||
Cluster URL: localhost | ||
Couchbase username: Administrator | ||
Couchbase password: ········ | ||
Couchbase bucket: mistralai | ||
Couchbase scope: _default | ||
Couchbase collection: mistralai | ||
|
||
|
||
# Couchbase Connection | ||
|
||
|
||
```python | ||
auth = PasswordAuthenticator( | ||
couchbase_username, | ||
couchbase_password | ||
) | ||
``` | ||
|
||
|
||
```python | ||
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) | ||
cluster.wait_until_ready(timedelta(seconds=5)) | ||
|
||
bucket = cluster.bucket(couchbase_bucket) | ||
scope = bucket.scope(couchbase_scope) | ||
collection = scope.collection(couchbase_collection) | ||
``` | ||
|
||
# Creating Couchbase Vector Search Index | ||
In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html). | ||
|
||
|
||
```python | ||
search_index_name = couchbase_bucket + "._default.vector_test" | ||
search_index = cluster.search_indexes().get_index(search_index_name) | ||
``` | ||
|
||
# Mistral Connection | ||
|
||
|
||
```python | ||
MISTRAL_API_KEY = getpass.getpass("Mistral API Key:") | ||
mistral_client = Mistral(api_key=MISTRAL_API_KEY) | ||
``` | ||
|
||
# Embedding Documents | ||
Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block: | ||
|
||
|
||
```python | ||
texts = [ | ||
"Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", | ||
"It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", | ||
input("custom embedding text") | ||
] | ||
embeddings = mistral_client.embeddings.create( | ||
model="mistral-embed", | ||
inputs=texts, | ||
) | ||
|
||
print("Output embeddings: " + str(len(embeddings.data))) | ||
``` | ||
|
||
The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information: | ||
|
||
``` | ||
EmbeddingResponse( | ||
id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[ | ||
Data(object='embedding', embedding=[-0.0165863037109375,...], index=0), | ||
Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)], | ||
Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)], | ||
model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15) | ||
) | ||
``` | ||
|
||
# Storing Embeddings in Couchbase | ||
Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document: | ||
|
||
|
||
```python | ||
for i in range(0, len(texts)): | ||
doc = { | ||
"id": str(uuid.uuid4()), | ||
"text": texts[i], | ||
"vector": embeddings.data[i].embedding, | ||
} | ||
collection.upsert(doc["id"], doc) | ||
``` | ||
|
||
# Searching For Embeddings | ||
Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt: | ||
|
||
|
||
```python | ||
search_embedding = mistral_client.embeddings.create( | ||
model="mistral-embed", | ||
inputs=["name a multipurpose database with distributed capability"], | ||
).data[0] | ||
|
||
search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search( | ||
VectorSearch.from_vector_query( | ||
VectorQuery( | ||
"vector", search_embedding.embedding, num_candidates=1 | ||
) | ||
) | ||
) | ||
result = scope.search( | ||
"vector_test", | ||
search_req, | ||
SearchOptions( | ||
limit=13, | ||
fields=["vector", "id", "text"] | ||
) | ||
) | ||
for row in result.rows(): | ||
print("Found answer: " + row.id + "; score: " + str(row.score)) | ||
doc = collection.get(row.id) | ||
print("Answer text: " + doc.value["text"]) | ||
|
||
|
||
``` | ||
|
||
Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662 | ||
Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. | ||
|
Oops, something went wrong.