A vector embedding encodes an input as a list of floating point numbers.
"dog" → [0.017198, -0.007493, -0.057982, 0.054051, -0.028336, 0.019245,…]
Different models output different embeddings, with varying lengths.
Model | Encodes | Vector length |
---|---|---|
word2vec | words | 300 |
Sbert (Sentence-Transformers) | text (up to ~400 words) | 768 |
OpenAI text-embedding-ada-002 | text (up to 8191 tokens) | 1536 |
OpenAI text-embedding-3-small | text (up to 8191 tokens) | 256-1536 |
OpenAI text-embedding-3-large | text (up to 8191 tokens) | 256-3072 |
Azure AI Vision | image or text | 1024 |
Vector embeddings are commonly used for similarity search, fraud detection, recommendation systems, and RAG (Retrieval-Augmented Generation).
This repository contains a visual exploration of vectors, using several embedding models.
Before running the notebooks, install the requirements:
pip install -r requirements.txt
Then explore these notebooks:
- Generate new OpenAI text embeddings
- Compare OpenAI and Word2Vec embeddings
- Vector similarity
- Vector search
- Generate multimodal vectors for dataset
- Explore multimodal vectors
- Vector distance metrics
- Vector quantization
- Vector dimension reduction (MRL)
These notebooks are also provided, but aren't necessary unless you're generating new embeddings data.
If you need to generate new OpenAI embeddings, you'll need access to the embedding models via the API. This project includes infrastructure as code (IaC) to provision an Azure OpenAI deployment of "text-embedding-3-large". The IaC is defined in the infra
directory and uses the Azure Developer CLI to provision the resources.
-
Make sure the Azure Developer CLI (azd) is installed.
-
Login to Azure:
azd auth login
For GitHub Codespaces users, if the previous command fails, try:
azd auth login --use-device-code
-
Provision the OpenAI account:
azd provision
It will prompt you to provide an
azd
environment name (like "vector-demos"), select a subscription from your Azure account, and select a location. Then it will provision the resources in your account. -
Once the resources are provisioned, you should now see a local
.env
file with all the environment variables needed to run the scripts. -
To delete the resources, run:
azd down
Each notebook has resources at the bottom to dig further into that topic. Here are some additional general resources: