File tree Expand file tree Collapse file tree 2 files changed +15
-17
lines changed Expand file tree Collapse file tree 2 files changed +15
-17
lines changed Original file line number Diff line number Diff line change @@ -4,14 +4,6 @@ title: Embedding Examples
4
4
nav_order : 5
5
5
---
6
6
7
- ## Table of contents
8
- {: .no_toc .text-delta }
9
-
10
- - TOC
11
- {: toc }
12
-
13
- ## Adding embeddings with langchain4j
14
-
15
7
The vector queries shown in the [ langchain] ( ../rag-langchain-python/README.md ) ,
16
8
[ langchain4j] ( ../rag-langchain-java ) , and [ langchain.js] ( ../rag-langchain-js/README.md ) RAG examples
17
9
depend on embeddings - vector representations of text - being added to documents in MarkLogic. Vector queries can
@@ -21,6 +13,12 @@ This project demonstrates the use of a
21
13
the [ MarkLogic Data Movement SDK] ( https://docs.marklogic.com/guide/java/data-movement ) for adding embeddings to
22
14
documents in MarkLogic.
23
15
16
+ ## Table of contents
17
+ {: .no_toc .text-delta }
18
+
19
+ - TOC
20
+ {: toc }
21
+
24
22
## Setup
25
23
26
24
This example depends both on the [ main setup for all examples] ( ../setup/README.md ) and also on having run the
@@ -29,7 +27,7 @@ This example depends both on the [main setup for all examples](../setup/README.m
29
27
the text in Enron email documents and write each chunk of text to a separate document. This example will then use
30
28
langchain4j to generate an embedding for the chunk of text and add it to each chunk document.
31
29
32
- ## Add embeddings example
30
+ ## Adding embedding to documents
33
31
34
32
To try the embedding example, run the following Gradle task:
35
33
Original file line number Diff line number Diff line change @@ -4,21 +4,21 @@ title: Splitting Examples
4
4
nav_order : 4
5
5
---
6
6
7
- ## Table of contents
8
- {: .no_toc .text-delta }
9
-
10
- - TOC
11
- {: toc }
12
-
13
- ## Splitting documents with langchain4j
14
-
15
7
A RAG approach typically benefits from sending multiple smaller segments or "chunks" of text to an LLM. While MarkLogic
16
8
can efficiently ingest and index large documents, sending all the text in even a single document may either exceed
17
9
the number of tokens allowed by your LLM or may result in slower and more expensive responses from the LLM. Thus,
18
10
when importing or reprocessing documents in MarkLogic, your RAG approach may benefit from splitting the searchable
19
11
text in a document into smaller segments or "chunks" that allow for much smaller and more relevent segments of text
20
12
to be sent to the LLM.
21
13
14
+ ## Table of contents
15
+ {: .no_toc .text-delta }
16
+
17
+ - TOC
18
+ {: toc }
19
+
20
+ ## Overview
21
+
22
22
This project demonstrates two different approaches to splitting documents:
23
23
24
24
1 . Splitting the text in a document and storing each chunk in a new separate document.
You can’t perform that action at this time.
0 commit comments