Skip to content

Commit 90080da

Browse files
authored
langchain4j tutorial (#33)
* langchain4j tutorial * touch for actions
1 parent 1a0a438 commit 90080da

File tree

1 file changed

+182
-0
lines changed

1 file changed

+182
-0
lines changed

tutorial/markdown/java/langchain4j.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
# frontmatter
3+
path: "/tutorial-java-langchain4j"
4+
title: Langchain4j Vector Storage
5+
short_title: Transactions w/ Java SDK
6+
description:
7+
- Learn how to configure and use couchbase vector search with LangChain4j
8+
- Learn how to vectorize data with LangChain4j
9+
- Learn how to retrieve vector data from Couchbase
10+
content_type: tutorial
11+
filter: sdk
12+
technology:
13+
- connectors
14+
- vector search
15+
tags:
16+
- LangChain
17+
- Artificial Intelligence
18+
- Data Ingestion
19+
sdk_language:
20+
- java
21+
length: 10 Mins
22+
---
23+
24+
## Prerequisites
25+
26+
To run this example project, you will need:
27+
28+
- [Couchbase Capella](https://docs.couchbase.com/cloud/get-started/create-account.html) account or locally installed [Couchbase Server](/tutorial-couchbase-installation-options)
29+
- Git
30+
- Java SDK 8+
31+
- Code Editor
32+
33+
## About This Tutorial
34+
This tutorial will show how to use a Couchbase database cluster as an Langchain4j embedding storage.
35+
36+
## Example Source code
37+
Example source code for this tutorial can be obtained from [Langchain4j examples github project](https://github.com/langchain4j/langchain4j-examples/tree/main/couchbase-example).
38+
To do this, clone the repository using git:
39+
```shell
40+
git clone https://github.com/langchain4j/langchain4j-examples.git
41+
cd langchain4j-examples/couchbase-example
42+
```
43+
44+
## What Is Langchain4j
45+
Langchain4j is a framework library that simplifies integrating LLM-based services into Java applications.
46+
Additional information about the framework and its usage can be obtained at [Langchain4j documentation website](https://docs.langchain4j.dev/intro/).
47+
48+
## What Is An Embedding Store
49+
In Langchain4j, [embedding stores](https://docs.langchain4j.dev/integrations/embedding-stores/) are used to store
50+
vector embeddings that represent coordinates in an embedding space. The topology and dimensionality of the embedding space are
51+
defined by selected language model. Each coordinate in the space represents some kind of sentiment or idea and
52+
the closer any two embedding vectors are to each other, the closer to each other the ideas that they represent. By storing
53+
acquired from pretrained model embeddings in a dedicated storage, developers can greatly optimize the performance of their
54+
AI-based applications.
55+
56+
## Couchbase Embedding Store
57+
Couchbase langchain4j integration stores each embedding in a separate document and uses an FTS vector index to perform
58+
queries against stored vectors. Currently, it supports storing embeddings and their metadata, as well as removing
59+
embeddings. Filtering selected by vector search embeddings by their metadata was not supported at the moment of writing
60+
this tutorial. Please note that the embedding store integration is still under active development and the default
61+
configurations it comes with are not recommended for production usage.
62+
63+
### Connecting To Couchbase Cluster
64+
A builder class can be used to initialize couchbase embedding store. The following parameters are required for
65+
initialization:
66+
- cluster connection string
67+
- cluster username
68+
- cluster password
69+
- name of the bucket in which embeddings should be stored
70+
- name of the scope in which embeddings should be stored
71+
- name of the collection in which embeddings should be stored
72+
- name of an FTS vector index to be used by the embedding store
73+
- dimensionality (length) of vectors to be stored
74+
75+
The following sample code illustrates how to initialize an embedding store that connects to a locally running Couchbase
76+
server:
77+
78+
```java
79+
CouchbaseEmbeddingStore embeddingStore = new CouchbaseEmbeddingStore.Builder("localhost:8091")
80+
.username("Administrator")
81+
.password("password")
82+
.bucketName("langchain4j")
83+
.scopeName("_default")
84+
.collectionName("_default")
85+
.searchIndexName("test")
86+
.dimensions(512)
87+
.build();
88+
```
89+
90+
The sample source code provided with this tutorial uses a different approach and starts a dedicated to it Couchbase
91+
server using `testcontainers` library:
92+
93+
```java
94+
CouchbaseContainer couchbaseContainer =
95+
new CouchbaseContainer(DockerImageName.parse("couchbase:enterprise").asCompatibleSubstituteFor("couchbase/server"))
96+
.withCredentials("Administrator", "password")
97+
.withBucket(testBucketDefinition)
98+
.withStartupTimeout(Duration.ofMinutes(1));
99+
100+
CouchbaseEmbeddingStore embeddingStore = new CouchbaseEmbeddingStore.Builder(couchbaseContainer.getConnectionString())
101+
.username(couchbaseContainer.getUsername())
102+
.password(couchbaseContainer.getPassword())
103+
.bucketName(testBucketDefinition.getName())
104+
.scopeName("_default")
105+
.collectionName("_default")
106+
.searchIndexName("test")
107+
.dimensions(384)
108+
.build();
109+
```
110+
111+
### Vector Index
112+
The embedding store uses an FTS vector index in order to perform vector similarity lookups. If provided with a name for
113+
vector index that does not exist on the cluster, the store will attempt to create a new index with default
114+
configuration based on the provided initialization settings. It is recommended to manually review the settings for the
115+
created index and adjust them according to specific use cases. More information about vector search and FTS index
116+
configuration can be found at [Couchbase Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search.html).
117+
118+
### Embedding Documents
119+
The integration automatically assigns unique `UUID`-based identifiers to all stored embeddings. Here is
120+
an example embedding document (with vector field values truncated for readability):
121+
122+
```json
123+
{
124+
"id": "f4831648-07ca-4c77-a031-75acb6c1cf2f",
125+
"vector": [
126+
...
127+
0.037255168,
128+
-0.001608681
129+
],
130+
"text": "text",
131+
"metadata": {
132+
"some": "value"
133+
},
134+
"score": 0
135+
}
136+
```
137+
138+
These embeddings are generated with a selected by developers LLM and resulting vector values are model-specific.
139+
140+
## Storing Embeddings in Couchbase
141+
Generated with a language model embeddings can be stored in couchbase using the `add` method an instance of `CouchbaseEmbeddingStore`
142+
class:
143+
```java
144+
EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
145+
146+
TextSegment segment1 = TextSegment.from("I like football.");
147+
Embedding embedding1 = embeddingModel.embed(segment1).content();
148+
embeddingStore.add(embedding1, segment1);
149+
150+
TextSegment segment2 = TextSegment.from("The weather is good today.");
151+
Embedding embedding2 = embeddingModel.embed(segment2).content();
152+
embeddingStore.add(embedding2, segment2);
153+
154+
Thread.sleep(1000); // to be sure that embeddings were persisted
155+
```
156+
157+
## Querying Relevant Embeddings
158+
After adding some embeddings into the store, a query vector can be used to find relevant to it embeddings in the store.
159+
Here, we're using the embedding model to generate a vector for the phrase "what is your favorite sport?". The obtained
160+
vector is then being used to find the most relevant answer in the database:
161+
```java
162+
Embedding queryEmbedding = embeddingModel.embed("What is your favourite sport?").content();
163+
List<EmbeddingMatch<TextSegment>> relevant = embeddingStore.findRelevant(queryEmbedding, 1);
164+
EmbeddingMatch<TextSegment> embeddingMatch = relevant.get(0);
165+
```
166+
167+
The relevancy score and text of the selected answer can then be printed to the application output:
168+
```java
169+
System.out.println(embeddingMatch.score()); // 0.81442887
170+
System.out.println(embeddingMatch.embedded().text()); // I like football.
171+
```
172+
173+
## Deleting Embeddings
174+
Couchbase embedding store also supports removing embeddings by their identifiers, for example:
175+
```java
176+
embeddingStore.remove(embeddingMatch.id())
177+
```
178+
179+
Or, to remove all embeddings:
180+
```java
181+
embeddingStore.removeAll();
182+
```

0 commit comments

Comments
 (0)