Docs Agent concepts

Note: If you want to set up and launch the Docs Agent chat app on your host machine, see the Set up Docs Agent section in README.

This page describes the architecture and features of Docs Agent.

Overview

The Docs Agent chat app is designed to be easily set up and configured in a Linux environment. and require that you have access to Google’s Gemini API.

Docs Agent uses a technique known as Retrieval Augmented Generation (RAG), which allows you to bring your own documents as knowledge sources to AI language models. This approach helps the AI language models to generate relevant and accurate responses that are grounded in the information that you provide and control.

Figure 1. Docs Agent uses a vector database to retrieve context for augmenting prompts.

Main features

The key features of the Docs Agent chat app are:

Add contextual information to user questions to augment prompts for AI language models.
Process documents into embeddings and store them in a vector database for semnatic retrieval.

Figure 2. A user question is augmented by the Docs Agent server and passed to an LLM.

For the moment, the Docs Agent project focuses on providing Python scripts that make it easy to process Markdown files into embeddings. However, there is no hard requirement that the source documents must exist in Markdown format. What’s important is that the processed content is available as embeddings in the vector database.

Structure of a prompt to a language model

To enable an LLM to answer questions that are not part of the public knowledge (which the LLM is likely trained on), the Docs Agent project applies a mixture of prompt engineering and embeddings techniques. That is, we process a set of documents (which contain domain specific knowledge) into embeddings and store them in a vector database. This vector database allows the Docs Agent server to perform semantic search on stored embeddings to find the most relevant content from the source documents given user questions.

Once the most relevant content is returned, the Docs Agent server uses the prompt structure shown in Figure 3 to augment the user question with a preset condition and a list of context. (When the Docs Agent server starts, the condition value is read from the config.yaml file.) Then the Docs Agent server sends this prompt to a language model using the Gemini API and receives a response generated by the model.

Figure 3. Prompt structure for augmenting a user question with related context (Context source: eventhorizontelescope.org)

Processing of Markdown files into embeddings

To process information into embeddings using the Python scripts in the project, the information needs to be stored in Markdown format. Once you have a set of Markdown files stored in a directory on your host machine, you can run the files_to_plain_text.py script to process those Markdown files into small plain text files – the script splits the content by the top three Markdown headers (#, ##, and ###).

Once Markdown files are processed into small plain text files, you can run the populate_vector_database.py script to generate embeddings for each text file and store those embeddings into a Chroma vector database running on the host machine.

The embeddings in this vector database enable the Docs Agent server to perform semantic search and retrieve context related to user questions for augmenting prompts.

For more information on the processing of Markdown files, see the README file in the scripts directory.

Figure 4. A document is split into small semantic chunks, which are then used to generate embeddings.

Figure 5. A Markdown page is split by headers and processed into embeddings.

Summary of tasks and features

The following list summarizes the tasks and features of the Docs Agent chat app:

Process Markdown: Split Markdown files into small plain text files. (See the Python scripts in the preprocess directory.)
Generate embeddings: Use an embedding model to process small plain text files into embeddings, and store them in a vector database. (See the populate_vector_database.py script.)
Perform semantic search: Compare embeddings in the vector database to retrieve most relevant content given user questions.
Add context to a user question: Add a list of text chunks returned from a semantic search as context in a prompt.
(Experimental) “Fact-check” responses: This experimental feature composes a follow-up prompt and asks the language model to “fact-check” its own previous response. (See the Using a language model to fact-check its own response section.)
Generate related questions: In addition to displaying a response to the user question, the web UI displays 5 questions generated by the language model based on the context of the user question. (See the Using a language model to suggest related questions section.)
Return URLs of documentation sources: Docs Agent's vector database stores URLs as metadata next to embeddings. Whenever the vector database is used to retrieve text chunks for context, the database can also return the URLs of the sources used to generate the embeddings.
Collect feedback from users: Docs Agent's chatbot web UI includes buttons that allow users to like generated responses or submit rewrites.
Convert Google Docs, PDF, and Gmail into Markdown files: This feature uses Apps Script to convert Google Docs, PDF, and Gmail into Markdown files, which then can be used as input datasets for Docs Agent. (See the apps_script directory.)
Run benchmark test to monitor the quality of AI-generated responses: Using Docs Agent, you can run benchmark test to measure and compare the quality of text chunks, embeddings, and AI-generated responses.
Use the Semantic Retrieval API and AQA model: You can use Gemini's Semantic Retrieval API to upload source documents to an online corpus and use the AQA model that is specifically created for answering questions using an online corpus.

Flow of events

The following events take place in the Docs Agent chat app:

The files_to_plain_text.py script converts input Markdown documents into small plain text files, split by Markdown headings (#, ##, and ###).
The populate_vector_database.py script generates embeddings from the small plain text files and populates a vector database.
When the [agent chatbot] command is run, it starts the Docs Agent server and vector database, which loads generated embeddings and metadata (URLs and filenames) stored in the vector_store directory.
When the user asks a question, the Docs Agent server uses the vector database to perform semantic search on embeddings, which represent content in the source documents.
Using this semantic search capability, the Docs Agent server finds a list of text chunks that are most relevant to the user question.
The Docs Agent server adds this list of text chunks as context (plus a condition for responses) to the user question and constructs them into a prompt.
The system sends the prompt to a language model via the Gemini API.
The language model generates a response and the Docs Agent server renders it on the chat UI.

Additional events for “fact-checking” a generated response:

The Docs Agent server prepares another prompt that compares the generated response (in step 8) to the context (in step 6) and asks the language model to look for a discrepancy in the response.
The language model generates a response that points out one major discrepancy (if it exists) between its previous response and the context.
The Docs Agent server renders this response on the chat UI as a call-out note.
The Docs Agent server passes this second response to the vector database to perform semantic search.
The vector database returns a list of relevant content (that is closely related to the second response).
The Docs Agent server renders the top URL of this list on the chat UI and suggests that the user checks out this URL for fact-checking.

Additional events for suggesting 5 questions related to the user question:

The Docs Agent server prepares another prompt that asks the language model to generate 5 questions based on the context (in step 6).
The language model generates a response that contains a list of questions related to the context.
The Docs Agent server renders the questions on the chat UI.

Supplementary features

This section describes additional features implemented on the Docs Agent chat app for enhancing the usability of the Q&A experience powered by generative AI.

Figure 6. A screenshot of the Docs Agent chat UI showing the sections generated by three distinct prompts.

Using a language model to fact-check its own response

In addition to using the prompt structure above (shown in Figure 3), we‘re currently experimenting with the following prompt setup for “fact-checking” responses generated by the language model:

Condition:

You are a helpful chatbot answering questions from users. Read the following context
first and answer the question at the end:

Context:
```
<CONTEXT_USED_IN_THE_PREVIOUS_PROMPT>
```

Additional condition (for fact-checking):

Can you compare the text below to the information provided in this prompt above
and write a short message that warns the readers about which part of the text they
should consider fact-checking? (Please keep your response concise and focus on only
one important item.)"

Previously generated response

Text: <RESPONSE_RETURNED_FROM_THE_PREVIOUS_PROMPT>

This "fact-checking" prompt returns a response similar to the following example:

The text states that Flutter chose to use Dart because it is a fast, productive, object-oriented
language that is well-suited for building user interfaces. However, the context provided in the
prompt states that Flutter chose Dart because it is a fast, productive language that is well-suited
for Flutter's problem domain: creating visual user experiences. Therefore, readers should consider
fact-checking the claim that Dart is well-suited for building user interfaces.

After the second response, notice that the Docs Agent chat UI also suggests a URL to visit for fact-checking (see Figure 6), which looks similar to the following example:

To verify this information, please check out:

https://docs.flutter.dev/resources/faq

To identify this URL, the Docs Agent server takes the second response (which is the paragraph that begins with “The text states that ...” in the example above) and uses it to query the vector database. Once the vector database returns a list of the most relevant content to this response, the UI only displays the top URL to the user.

Keep in mind that this "fact-checking" prompt setup is currently considered experimental because we‘ve seen cases where a language model would end up adding incorrect information into its second response as well. However, we saw that adding this second response (which brings attention to the language model’s possible hallucinations) seems to improve the usability of the system since it serves as a reminder to the users that the language model‘s response is far from being perfect, which helps encourage the users to take more steps to validate generated responses for themselves.

Using a language model to suggest related questions

The project‘s latest web UI includes the “Related questions” section, which displays five questions that are related to the user question (see Figure 6). These five questions are also generated by a language model (via the Gemini API). Using the list of contents returned from the vector database as context, the system prepares another prompt asking the language model to generate five questions from the included context.

The following is the exact structure of this prompt:

Condition:

Read the context below and answer the question at the end:

Context:
```
<CONTEXT_USED_IN_THE_PREVIOUS_PROMPT>
```

Question:

What are 5 questions developers might ask after reading the context?

Enabling users to submit a rewrite of a generated response

The project‘s latest web UI includes the Rewrite this response button at the bottom of the panel (see Figure 6). When this button is clicked, a widget opens up, expanding the main UI panel, and reveals a textarea containing the generated response to the user's question. The user is then allowed to edit this response in the textarea and click the Submit button to submit the updated response to the system.

The system stores the submitted response as a Markdown file in the project's local rewrites directory. The user may re-click the Submit button to update the submitted rewrite multiple times.

Enabling users to like generated responses

The project's latest web UI includes the Like this response button at the bottom of the panel (see Figure 6). When this button is clicked, the server logs the event of "like" for the response. However, clicking the Liked button again will reset the button. Then the server logs this reset event of "like" for the response.

The user may click this like button multiple times to toggle the state of the like button. But when examining the logs, only the final state of the like button will be considered for the response.

Using Google Docs, PDF, or Gmail as input sources

The project includes Apps Script files that allow you to convert various sources of content (including Google Docs and PDF) from your Google Drive and Gmail into Markdown files. You can then use these Markdown files as additional input sources for Docs Agent. For more information, see the README file in the apps_script directory.

Figure 7. Docs Agent's pre-processing flow for various doc types.

Using the Semantic Retrieval API and AQA model

Docs Agent provides options to use Gemini's Semantic Retrieval API for storing text chunks in Google Cloud's online storage (and using this online storage for context retrieval), in combination with using the AQA model for question-answering.

To use the Semantic Retrieval API, update the config.yaml file to the following settings:

models:
  - language_model: "models/aqa"

...

db_type: "google_semantic_retriever"

The setup above uses both the Semantic Retrieval API and the AQA model.

Note: At the moment, when db_type is set to google_semantic_retriever, running the populate_vector_database.py script will also create and popluate a local vector database using Chroma as well as creating and populating an online corpus using the Semantic Retrieval API.

However, if you want to use only the AQA model without using an online corpus, update the config.yaml file to the following settings instead:

models:
  - language_model: "models/aqa"

...

db_type: "chroma"

The setup above uses the AQA model with your local Chroma vector database. For more information, see the More Options: AQA Using Inline Passages section on the Semantic Retriever Quickstart page.

Note: To use the Semantic Retrieval API, you need to complete the OAuth setup for your Google Cloud project from your host machine. For detailed instructions, see the Authentication with OAuth quickstart page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!