Skip to content

Commit edf6796

Browse files
authored
Merge pull request #26 from BillFarber/task/addSemaphoreInfo
First pass at adding some Semaphore information to the documents.
2 parents 0a82736 + e05ec16 commit edf6796

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

docs/rag-examples/rag-python.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,58 @@ For more information, please see the following code files:
124124

125125
For an example of how to add embeddings to your data, please see [this embeddings example](../embedding.md).
126126

127+
## RAG with Semaphore Models
128+
129+
[Progress Semaphore](https://www.progress.com/semaphore/platform) is a modular semantic AI platform that provides the
130+
semantic layer of your digital ecosystem so you can manage knowledge models, extract facts and classify the context and
131+
meaning from structured and unstructured information and generate rich semantic metadata.
132+
133+
Details for classifying text are specific to your Semaphore installation. However, for a Progress Data Cloud
134+
installation, see the
135+
[Classification and Language Service Developer's Guide](https://portal.smartlogic.com/docs/5.6/classification_server_-_developers_guide/welcome).
136+
137+
Once you have [classified](https://www.progress.com/semaphore/platform/semantic-knowledge-classification) your documents
138+
and stored the extracted concepts on the documents, you can also search for those concepts as a part of the RAG
139+
retriever. A typical strategy is to use your custom model and the Semaphore Classifier to extract concepts from the
140+
user's question. With that list of concepts, you can easily search your target documents for those that have matching
141+
concepts, and then include those documents in the list of documents returned by the retriever.
142+
143+
For instance, assume that you have extracted the concepts from a document and stored those concepts in a new JSON block in the
144+
document that looks something like this:
145+
```
146+
"concepts": [
147+
{
148+
"CrimeReportsModel-Crimes": "Public Order Crime"
149+
},
150+
{
151+
"CrimeReportsModel-Crimes": "Disturbing the Peace"
152+
},
153+
...
154+
]
155+
```
156+
You can search for all documents that have been classified with the `Crimes` concept in the `CrimesReport` model using
157+
a CTS query:
158+
```
159+
cts.jsonPropertyValueQuery('CrimeReportsModel-Crimes', 'Crimes')
160+
```
161+
That query can be used on its own or as part of more complex query that retrieves the documents that provide the best
162+
context information to your LLM. One possibility is to adapt the vector retriever to use that query in the initial
163+
documents query. So, as an adaptation from `vector_query_retriever.py`, this uses the `jsonPropertyValueQuery` instead
164+
of the `wordQuery`.
165+
```
166+
op.fromSearchDocs(
167+
cts.andQuery([
168+
cts.jsonPropertyValueQuery('CrimeReportsModel-Crimes', 'Crimes'),
169+
cts.collectionQuery('events')
170+
]),
171+
null,
172+
{
173+
'scoreMethod': 'score-bm25',
174+
'bm25LengthWeight': 0.5
175+
}
176+
)
177+
```
178+
127179
## Summary
128180

129181
The three RAG approaches shown above - a simple word query, a contextual query, and a vector query - demonstrate how

0 commit comments

Comments
 (0)