Skip to content

Commit

Permalink
Merge branch 'main' into udpate-php-tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
teetangh authored Jan 23, 2025
2 parents 4d68cda + 90080da commit f55ab8d
Show file tree
Hide file tree
Showing 28 changed files with 8,206 additions and 23 deletions.
2 changes: 1 addition & 1 deletion test/test-markdown-frontmatter.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ const sdk_languages = ['nodejs', 'scala', 'python', 'swift', 'csharp', 'objectiv

const tags = ['Ottoman', 'Ktor', 'REST API', 'Express', 'Flask', 'TLS', 'Configuration', 'Next.js', 'iOS', 'Xcode', '.NET', 'Xamarin', 'Authentication', 'OpenID', 'Keycloak', 'Android', 'P2P', 'UIKit', 'Installation', 'Spring Boot', 'Spring Data', 'Transactions', 'SQL++ (N1QL)', 'Optimization', 'Community Edition', 'Docker', 'Data Modeling', 'Metadata', 'Best Practices', 'Data Ingestion', 'Kafka', 'Support', 'Customer', 'Prometheus', 'Monitoring', 'Observability', 'Metrics', 'Query Workbench', 'ASP.NET', 'linq', 'DBaaS', 'App Services', 'Flutter', 'Gin Gonic', 'FastAPI', 'Laravel', 'LangChain', 'OpenAI', 'Streamlit', 'Google Gemini', 'Nvidia NIM', 'LLama3', 'AWS', 'Artificial Intelligence', 'Cohere', 'Jina AI', 'Mistral AI', 'Ragas', 'Haystack', 'LangGraph', 'Amazon Bedrock', 'CrewAI']

const technologies = ['connectors', 'kv', 'query', 'capella', 'server', 'index', 'mobile', 'fts', 'sync gateway', 'eventing', 'analytics', 'udf']
const technologies = ['connectors', 'kv', 'query', 'capella', 'server', 'index', 'mobile', 'fts', 'sync gateway', 'eventing', 'analytics', 'udf', 'vector search']

const content_types = ['quickstart', 'tutorial', 'learn']

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions tutorial/markdown/generated/vector-search-cookbook/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
## ALL FILES IN THIS DIRECTORY ARE GENERATED. TO EDIT, VISIT THE SOURCE REPO, MENTIONED IN THE COMMIT MESSAGE

Large diffs are not rendered by default.

468 changes: 468 additions & 0 deletions tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md

Large diffs are not rendered by default.

232 changes: 232 additions & 0 deletions tutorial/markdown/generated/vector-search-cookbook/mistralai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
---
# frontmatter
path: "/tutorial-mistralai-couchbase-vector-search"
title: Using Mistral AI Embeddings with Couchbase Vector Search
short_title: Mistral AI with Couchbase Vector Search
description:
- Learn how to generate embeddings using Mistral AI and store them in Couchbase.
- This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings.
- You'll understand how to perform vector search to find relevant documents based on similarity.
content_type: tutorial
filter: sdk
technology:
- vector search
tags:
- Artificial Intelligence
- Mistral AI
sdk_language:
- python
length: 30 Mins
---


<!--- *** WARNING ***: Autogenerated markdown file from jupyter notebook. ***DO NOT EDIT THIS FILE***. Changes should be made to the original notebook file. See commit message for source repo. -->


[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/mistralai.ipynb)

# Introduction

Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises.

Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs.

The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via:

- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time
- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion
- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers
- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools
- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models
- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object
- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models


# How to run this tutorial

This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/mistralai/mistralai.ipynb).

You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment.

# Before you start

## Get Credentials for Mistral AI

Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials.

## Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.

To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.

# Install necessary libraries


```python
!pip install couchbase mistralai
```

# Imports


```python
from pathlib import Path
from datetime import timedelta
from mistralai import Mistral
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import (ClusterOptions, ClusterTimeoutOptions,
QueryOptions)
import couchbase.search as search
from couchbase.options import SearchOptions
from couchbase.vector_search import VectorQuery, VectorSearch
import uuid
```

# Prerequisites



```python
import getpass
couchbase_cluster_url = input("Cluster URL:")
couchbase_username = input("Couchbase username:")
couchbase_password = getpass.getpass("Couchbase password:")
couchbase_bucket = input("Couchbase bucket:")
couchbase_scope = input("Couchbase scope:")
couchbase_collection = input("Couchbase collection:")
```

Cluster URL: localhost
Couchbase username: Administrator
Couchbase password: ········
Couchbase bucket: mistralai
Couchbase scope: _default
Couchbase collection: mistralai


# Couchbase Connection


```python
auth = PasswordAuthenticator(
couchbase_username,
couchbase_password
)
```


```python
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth))
cluster.wait_until_ready(timedelta(seconds=5))

bucket = cluster.bucket(couchbase_bucket)
scope = bucket.scope(couchbase_scope)
collection = scope.collection(couchbase_collection)
```

# Creating Couchbase Vector Search Index
In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html).


```python
search_index_name = couchbase_bucket + "._default.vector_test"
search_index = cluster.search_indexes().get_index(search_index_name)
```

# Mistral Connection


```python
MISTRAL_API_KEY = getpass.getpass("Mistral API Key:")
mistral_client = Mistral(api_key=MISTRAL_API_KEY)
```

# Embedding Documents
Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block:


```python
texts = [
"Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.",
"It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
input("custom embedding text")
]
embeddings = mistral_client.embeddings.create(
model="mistral-embed",
inputs=texts,
)

print("Output embeddings: " + str(len(embeddings.data)))
```

The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information:

```
EmbeddingResponse(
id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[
Data(object='embedding', embedding=[-0.0165863037109375,...], index=0),
Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)],
Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)],
model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15)
)
```

# Storing Embeddings in Couchbase
Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document:


```python
for i in range(0, len(texts)):
doc = {
"id": str(uuid.uuid4()),
"text": texts[i],
"vector": embeddings.data[i].embedding,
}
collection.upsert(doc["id"], doc)
```

# Searching For Embeddings
Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt:


```python
search_embedding = mistral_client.embeddings.create(
model="mistral-embed",
inputs=["name a multipurpose database with distributed capability"],
).data[0]

search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
VectorSearch.from_vector_query(
VectorQuery(
"vector", search_embedding.embedding, num_candidates=1
)
)
)
result = scope.search(
"vector_test",
search_req,
SearchOptions(
limit=13,
fields=["vector", "id", "text"]
)
)
for row in result.rows():
print("Found answer: " + row.id + "; score: " + str(row.score))
doc = collection.get(row.id)
print("Answer text: " + doc.value["text"])


```

Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662
Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.

Loading

0 comments on commit f55ab8d

Please sign in to comment.