First, ensure we have the necessary embedding models installed.
We’ll use the MiniLM model for local embeddings.
llm sentence-transformers register all-MiniLM-L6-v2
Let’s start with some simple embedding examples.
Create an embedding for a basic string.
llm embed -m sentence-transformers/all-MiniLM-L6-v2 -c "Hello world"
Store embeddings in a named collection.
# Store first phrase
llm embed phrases hello -m sentence-transformers/all-MiniLM-L6-v2 -c "Hello world"
# Store second phrase
llm embed phrases goodbye -c "Goodbye world"
# View collections
llm embed-db collections
Process multiple files and create embeddings.
Process all README files in the repository.
llm embed-multi readmes \
--model sentence-transformers/all-MiniLM-L6-v2 \
--files . '**/*.org'
Find similar content in our embedded documents.
llm similar readmes -c "llm commands"
Demonstrate clustering capabilities with embeddings.
First, install the clustering plugin.
llm install llm-cluster
Get and embed GitHub issues.
curl -s "https://api.github.com/repos/defrecord/llm-lab/issues" | \
jq '[.[] | {id: .id, title: .title}]' | \
llm embed-multi llm-lab-issues - \
--database data/embeddings/issues.db \
--model sentence-transformers/all-MiniLM-L6-v2 \
--store
# Run clustering analysis
llm cluster llm-lab-issues --database data/embeddings/issues.db 5 --summary
Example of using embeddings in Python code.
#!/usr/bin/env python3
import llm
def embed_text():
"""Example of embedding text with Python."""
model = llm.get_embedding_model("sentence-transformers/all-MiniLM-L6-v2")
vector = model.embed("This is text to embed")
print(f"Embedding vector: {vector[:5]}...") # Show first 5 elements
def work_with_collections():
"""Example of working with collections."""
collection = llm.Collection("entries",
model_id="sentence-transformers/all-MiniLM-L6-v2")
# Store items with metadata
collection.embed_multi(
[
("code", "Python implementation details"),
("docs", "Documentation and examples"),
("test", "Test suite and coverage"),
],
store=True,
)
# Find similar items
results = collection.similar("implementation guide")
for result in results:
print(f"Match: {result.id} - Score: {result.score}")
if __name__ == "__main__":
embed_text()
work_with_collections()
Process embedding output as JSON.
llm embed -m sentence-transformers/all-MiniLM-L6-v2 -c "Advanced example" | \
jq -r '.embedding | length'
Run clustering on a collection.
llm cluster entries --database data/embeddings/vector.db 3 --summary
Export embeddings for external use.
llm embed-db export entries
- All outputs are stored in data/embeddings/
- Using sentence-transformers/all-MiniLM-L6-v2 for local embeddings
- Python examples are tangled to data/embeddings/
- Clustering requires the llm-cluster plugin