title | description |
---|---|
Get better quality |
Get better answers by using state-of-the-art components |
import InstallSage from '/snippets/install-sage.mdx';
Our Quickstart guide is meant to get you up and running as quickly as possible. Here we focus on more accurate answers, faster responses and a generally more reliable pipeline. Since there's no free lunch, the setup is more involved.
In summary, this section walks you through the following steps:
- Switch from LLM-based retrieval to vector-based retrieval.
- Set up API keys for the embedder, vector store and reranker.
- Index the codebase.
- Enjoy the improved chat experience.
Responding to a user query involves two steps: (1) figuring out which files are the most relevant to the user query and (2) passing the content of these files together with the user query to an LLM.
By default, we simply list all the file paths in the codebase and ask an LLM to identify which ones are most relevant to the user query. We expect the LLM to make a decision solely based on the paths. This is suboptimal for multiple reasons:
- Enumerating all the file paths might exceed the LLM context.
- The LLM has no visibility into the actual content of the files.
A more principled way of retrieving relevant files invovles the following steps:
- Chunk all the files into relatively equally-sized text snippets.
- Embed the chunks (turn them into float vectors).
- Store the chunks in a vector database.
- At inference time, embed the user query and find its nearest neighbors in the vector database.
There are multiple components we need to set up:
- An embedder, which converts text into float vectors.
- A vector store, which stores the embeddings and performs nearest-neighbor search.
- A reranker, which takes the top N nearest neighbors retrieved from the vector score and re-order them based on the relevance to the user query.
In the following section, we will set API keys for these components.
There are multiple third-party providers that offer batch embedding APIs. The plot below shows how well they perform on our benchmark:
Overall, we recommend using OpenAI's text-embedding-3-small
model, which achieves highest quality and has the fastest batch embedding API. Below you will find instructions for each provider:
1. Create an API key here.
2. Export an environment variable:
export OPENAI_API_KEY=...
1. Create an API key following these instructions.
2. Export it as an environment variable:
export GOOGLE_API_KEY=...
1. Create an API key following these instructions.
2. Export it as an environment variable:
export VOYAGE_API_KEY=...
Currently, we only support Pinecone as a third-party managed vector database. We are actively working on adding more providers.
Here is how you can get it set up:
1. Create an API key following these instructions.
2. Export it as an environment variable:
export PINECONE_API_KEY=...
There are multiple third-party providers that offer reranking APIs. The plot below shows how well they perform on our benchmark:
We recommned using NVIDIA. Here are instructions for all the providers:
1. Create an API key by following these instructions. Note that API keys are model-specific. We recommend using nvidia/nv-rerankqa-mistral-4b-v3
.
2. Export an environment variable:
export NVIDIA_API_KEY=...
1. Create an API key following these instructions.
2. Export it as an environment variable:
export VOYAGE_API_KEY=...
1. Create an API key following these instructions.
2. Export it as an environment variable:
export COHERE_API_KEY=...
1. Create or get your API key following these instructions.
2. Export it as an environment variable:
export JINA_API_KEY=...
Now that we have set up all the necessary keys, we are ready to index our codebase.
-
Index the codebase. For instance, this is how you would index Hugging Face's Transformers library using OpenAI embeddings:
sage-index huggingface/transformers --no-llm-retriever --embedding-provider=openai
Once the codebase is indexed, the last piece to configure is the LLM that ingests the relevant files together with the user query to produce a response. We support OpenAI and Anthropic:
1. Create an API key here.
2. Export an environment variable:
export OPENAI_API_KEY=...
1. Create an API key following these instructions.
2. Export it as an environment variable:
export ANHTROPIC_API_KEY=...
Now we are finally ready to chat with the codebase and get much higher-quality answers:
sage-chat huggingface/transformers \
--llm-provider=openai \
--llm-model=gpt-4o \
--reranker-provider=nvidia
Happy chatting!
Once you select the desired providers for embedding and reranking, we will use reasonable default models from each. For instance, we default to the text-embedding-small-3
model from OpenAI. However, you can overwrite these defaults via command-line flags:
You can also customize the interaction with the Pinecone vector store by passing any of the following flags to both sage-chat
and sage-index
: