Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Add LiteLLMEmbeddings - Support SemanticChunking through LiteLLM #154

Open
wants to merge 3 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 108 additions & 30 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,71 +4,149 @@

Ever wondered how much CHONKier other text splitting libraries are? Well, wonder no more! We've put Chonkie up against some of the most popular RAG libraries out there, and the results are... well, let's just say Moto Moto might need to revise his famous quote!

## 📊 Size Comparison (Package Size)
## ⚡ Speed Benchmarks

### Default Installation (Basic Chunking)
> ZOOOOOM! Watch Chonkie run! 🏃‍♂️💨

| Library | Size | Chonk Factor |
|---------|------|--------------|
| 🦛 Chonkie | 9.7 MiB | 1x (base CHONK) |
| 🔗 LangChain | 80 MiB | ~8.3x CHONKier |
| 📚 LlamaIndex | 171 MiB | ~17.6x CHONKier |
### 100K Wikipedia Articles

### With Semantic Features
The following benchmarks were run on 100,000 Wikipedia articles from the
[`chonkie-ai/wikipedia-100k`](https://huggingface.co/datasets/chonkie-ai/wikipedia-100k) dataset

| Library | Size | Chonk Factor |
|---------|------|--------------|
| 🦛 Chonkie | 585 MiB | 1x (semantic CHONK) |
| 🔗 LangChain | 625 MiB | ~1.07x CHONKier |
| 📚 LlamaIndex | 678 MiB | ~1.16x CHONKier |
All tests were run on a Google Colab A100 instance.

## ⚡ Speed Benchmarks
#### Token Chunking

> ZOOOOOM! Watch Chonkie run! 🏃‍♂️💨
| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 58 sec | 1x |
| 🔗 LangChain | 1 min 10 sec | 1.21x slower |
| 📚 LlamaIndex | 50 min | 51.7x slower |

#### Sentence Chunking

| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 59 sec | 1x |
| 📚 LlamaIndex | 3 min 59 sec | 4.05x slower |
| 🔗 LangChain | N/A | Doesn't exist |

All benchmarks were run on the Paul Graham Essays Dataset using the GPT-2 tokenizer. Because Chonkie believes in transparency, we note that timings marked with ** were taken after a warm-up phase.
#### Recursive Chunking

### Token Chunking (ms)
| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 1 min 19 sec | 1x |
| 🔗 LangChain | 2 min 45 sec | 2.09x slower |
| 📚 LlamaIndex | N/A | Doesn't exist |

#### Semantic Chunking

Tested with `sentence-transformers/all-minilm-l6-v2` model unless specified otherwise.

| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie (with default settings) | 13 min 59 sec | 1x |
| 🦛 Chonkie | 1 hour 8 min min 53 sec | 4.92x slower |
| 🔗 LangChain | 1 hour 13 sec | 4.35x slower |
| 📚 LlamaIndex | 1 hour 24 min 15 sec| 6.07x slower |

### 500K Wikipedia Articles

The following benchmarks were run on 500,000 Wikipedia articles from the
[`chonkie-ai/wikipedia-500k`](https://huggingface.co/datasets/chonkie-ai/wikipedia-500k) dataset

All tests were run on a `c3-highmem-4` VM from Google Cloud with 32 GB RAM and a 200 GB SSD Persistent Disk attachment.

#### Token Chunking

| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 2 min 17 sec | 1x |
| 🔗 LangChain | 2 min 42 sec | 1.18x slower |
| 📚 LlamaIndex | 50 min | 21.9x slower |

#### Sentence Chunking

| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 7 min 16 sec | 1x |
| 📚 LlamaIndex | 10 min 55 sec | 1.5x slower |
| 🔗 LangChain | N/A | Doesn't exist |

#### Recursive Chunking

| Library | Time | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 3 min 42 sec | 1x |
| 🔗 LangChain | 7 min 36 sec | 2.05x slower |
| 📚 LlamaIndex | N/A | Doesn't exist |

### Paul Graham Essays Dataset

The following benchmarks were run on the Paul Graham Essays dataset using the GPT-2 tokenizer.

#### Token Chunking

| Library | Time (ms) | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 8.18** | 1x (fastest CHONK) |
| 🦛 Chonkie | 8.18 | 1x |
| 🔗 LangChain | 8.68 | 1.06x slower |
| 📚 LlamaIndex | 272 | 33.25x slower |

### Sentence Chunking (ms)
#### Sentence Chunking

| Library | Time (ms) | Speed Factor |
|---------|-----------|--------------|
| 🦛 Chonkie | 52.6 | 1x (solo CHONK) |
| 🦛 Chonkie | 52.6 | 1x |
| 📚 LlamaIndex | 91.2 | 1.73x slower |
| 🔗 LangChain | N/A | Doesn't exist |

### Semantic Chunking (ms)
#### Semantic Chunking

| Library | Time | Speed Factor |
|---------|------|--------------|
| 🦛 Chonkie | 482ms | 1x (smart CHONK) |
| 🦛 Chonkie | 482ms | 1x |
| 🔗 LangChain | 899ms | 1.86x slower |
| 📚 LlamaIndex | 1.2s | 2.49x slower |

## 💡 Why These Numbers Matter

### Size Benefits
1. **Faster Installation**: Less to download = faster to get started
2. **Lower Memory Footprint**: Lighter package = less RAM usage
3. **Cleaner Dependencies**: Only install what you actually need
4. **CI/CD Friendly**: Faster builds and deployments
## 📊 Size Comparison (Package Size)

### Default Installation (Basic Chunking)

| Library | Size | Chonk Factor |
|---------|------|--------------|
| 🦛 Chonkie | 11.2 MiB | 1x |
| 🔗 LangChain | 80 MiB | ~7.1x CHONKier |
| 📚 LlamaIndex | 171 MiB | ~15.3x CHONKier |

### With Semantic Features

| Library | Size | Chonk Factor |
|---------|------|--------------|
| 🦛 Chonkie | 62 MiB | 1x |
| 🔗 LangChain | 625 MiB | ~10x CHONKier |
| 📚 LlamaIndex | 678 MiB | ~11x CHONKier |

## 💡 Why These Numbers Matter

### Speed Benefits

1. **Faster Processing**: Chonkie leads in all chunking methods!
2. **Production Ready**: Optimized for real-world usage
3. **Consistent Performance**: Fast across all chunking types
4. **Scale Friendly**: Process more text in less time

### Size Benefits

1. **Faster Installation**: Less to download = faster to get started
2. **Lower Memory Footprint**: Lighter package = less RAM usage
3. **Cleaner Dependencies**: Only install what you actually need
4. **CI/CD Friendly**: Faster builds and deployments

Remember what Chonkie always says:
> "I may be a hippo, but I don't have to be heavy... and I can still run fast!" 🦛✨
> "I may be a hippo, but I'm still light and fast!" 🦛✨

---

*Note: All measurements were taken using Python 3.8+ on a clean virtual environment. Your actual mileage may vary slightly depending on your specific setup and dependencies. Speed benchmarks were performed on Paul Graham Essays Dataset using the GPT-2 tokenizer.*
*Note: All measurements were taken using Python 3.8+ on a clean virtual environment. Your actual mileage may vary slightly depending on your specific setup and dependencies.*
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ Documentation = "https://docs.chonkie.ai"
model2vec = ["model2vec>=0.3.0", "numpy>=1.23.0, <2.2"]
st = ["sentence-transformers>=3.0.0", "numpy>=1.23.0, <2.2"]
openai = ["openai>=1.0.0", "numpy>=1.23.0, <2.2"]
semantic = ["model2vec>=0.3.0", "numpy>=1.23.0, <2.2"]
all = ["sentence-transformers>=3.0.0", "numpy>=1.23.0, <2.2", "openai>=1.0.0", "model2vec>=0.3.0"]
semantic = ["model2vec>=0.1.0", "numpy>=1.23.0, <2.2"]
litellm = ["litellm>=1.57.10", "numpy>=1.23.0, <2.2"]
all = ["sentence-transformers>=3.0.0", "numpy>=1.23.0, <2.2", "openai>=1.0.0", "model2vec>=0.3.0", "litellm>=1.57.10"]
dev = [
"pytest>=6.2.0",
"pytest-cov>=4.0.0",
Expand Down
2 changes: 2 additions & 0 deletions src/chonkie/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
Model2VecEmbeddings,
OpenAIEmbeddings,
SentenceTransformerEmbeddings,
LiteLLMEmbeddings,
)
from .refinery import (
BaseRefinery,
Expand Down Expand Up @@ -78,6 +79,7 @@
"SentenceTransformerEmbeddings",
"OpenAIEmbeddings",
"AutoEmbeddings",
"LiteLLMEmbeddings",
]

# Add all refinery classes to __all__
Expand Down
2 changes: 2 additions & 0 deletions src/chonkie/embeddings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .model2vec import Model2VecEmbeddings
from .openai import OpenAIEmbeddings
from .sentence_transformer import SentenceTransformerEmbeddings
from .litellm import LiteLLMEmbeddings

# Add all embeddings classes to __all__
__all__ = [
Expand All @@ -11,4 +12,5 @@
"SentenceTransformerEmbeddings",
"OpenAIEmbeddings",
"AutoEmbeddings",
"LiteLLMEmbeddings",
]
6 changes: 6 additions & 0 deletions src/chonkie/embeddings/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ class AutoEmbeddings:
# Get Anthropic embeddings
embeddings = AutoEmbeddings.get_embeddings("anthropic://claude-v1", api_key="...")

# Get LiteLLM embeddings
embeddings = AutoEmbeddings.get_embeddings("huggingface/microsoft/codebert-base", api_key="...")

"""

@classmethod
Expand Down Expand Up @@ -52,6 +55,9 @@ def get_embeddings(
# Get Anthropic embeddings
embeddings = AutoEmbeddings.get_embeddings("anthropic://claude-v1", api_key="...")

# Get LiteLLM embeddings
embeddings = AutoEmbeddings.get_embeddings("huggingface/microsoft/codebert-base", api_key="...")

"""
# Load embeddings instance if already provided
if isinstance(model, BaseEmbeddings):
Expand Down
148 changes: 148 additions & 0 deletions src/chonkie/embeddings/litellm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
import importlib
from litellm import embedding
from litellm import token_counter
from typing import Callable, List, Optional
import os
import time
import numpy as np

from .base import BaseEmbeddings


class LiteLLMEmbeddings(BaseEmbeddings):

def __init__(
self,
model: str = 'huggingface/microsoft/codebert-base',
input: List[str] = "Hello, my dog is cute",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Dhan996!

I believe this input is just for checking the embedding response is coming through right? We don't have to offer the user the option to change this as a part of the signature; we can keep it fixed inside the __init__. It would be a good idea to offer minimal interface for the user as possible.

Thanks!

user: str = None,
dimensions: Optional[int] = None,
api_key: Optional[str] = None,
api_type: Optional[str] = None,
api_version: Optional[str] = None,
api_base: Optional[str] = None,
encoding_format: Optional[str] = None,
timeout: Optional[int] = 300,
input_type: Optional[str] = "feature-extraction",
):
"""Initialize LiteLLM embeddings.

Args:
model: Name of the LiteLLM embedding model to use
input: Text to embed
user: User ID for API requests
dimensions: Number of dimensions for the embedding model
api_key: API key for the model
api_type: Type of API to use
api_version: Version of the API to use
api_base: Base URL for the API
encoding_format: Encoding format for the input text
timeout: Timeout in seconds for API requests

"""
super().__init__()
if not self.is_available():
raise ImportError(
"LiteLLM package is not available. Please install it via pip."
)
else:
# Check if LiteLLM works with given parameters
try:
api_key = api_key if api_key is not None else os.environ.get("HUGGINGFACE_API_KEY")
my_list = []
my_list.append(input)
response = embedding(model=model, input=my_list, user=user, dimensions=dimensions, api_key=api_key, api_type=api_type, api_version=api_version, api_base=api_base, encoding_format=encoding_format, timeout=timeout)
except Exception as e:
raise ValueError(f"LiteLLM failed to initialize with the given parameters: {e}")
else:
self.kwargs = {
"user": user,
"dimensions": dimensions,
"api_key": api_key,
"api_type": api_type,
"api_version": api_version,
"api_base": api_base,
"encoding_format": encoding_format,
"timeout": timeout,
}
self.model = model
if dimensions is None:
self._dimension = len(response.data[0]['embedding'])
else:
self._dimension = dimensions

@property
def dimension(self) -> int:
return self._dimension


def embed(self, text: str) -> "np.ndarray":
if isinstance(text, str):
text = [text]
retries = 5 # Number of retries
wait_time = 10 # Wait time between retries
for i in range(retries):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Dhan996!

Just a doubt, but does LiteLLM do any retries internally?

If they handle it, then we can push any API retries to their end, o/w we should offer retries as a parameter during init.

Thanks!

try:
response = embedding(model=self.model, input=text, **self.kwargs)
except Exception as e:
print(f"Attempt {i+1}/{retries}: Model is still loading, retrying in {wait_time} seconds...")
time.sleep(wait_time)
else:
break
embeddings = response.data[0]['embedding']
return np.array(embeddings)

def embed_batch(self, texts: List[str]) -> List["np.ndarray"]:
if isinstance(texts, str):
texts = [texts]
retries = 5 # Number of retries
wait_time = 10 # Wait time between retries
for i in range(retries):
try:
responses = embedding(
model=self.model,
input=texts,
**self.kwargs
)
# Exit the loop if successful
except Exception as e:
print(f"Attempt {i+1}/{retries}: Model is still loading, retrying in {wait_time} seconds...")
time.sleep(wait_time)
else:
break

# response = embedding(model=self.model_name, input=texts, **self.kwargs)
np_embeddings = []
# np_embeddings.append([entry['embedding'] for entry in responses.data])
np_embeddings.extend(np.array(entry['embedding']) for entry in responses["data"])
return np_embeddings

def count_tokens(self, text: str) -> int:
return token_counter(model=self.model, text=text)

def count_tokens_batch(self, texts: List[str]) -> List[int]:
token_list = []
for i in texts:
token_list.append(token_counter(model=self.model, text=i))
return token_list

def _tokenizer_helper(self, string: str) -> int:
return token_counter(model=self.model, text=str)

def get_tokenizer_or_token_counter(self) -> "Callable[[str], int]":
return self._tokenizer_helper


def similarity(self, u: np.ndarray, v: np.ndarray) -> float:
"""Compute cosine similarity between two embeddings."""
return np.divide(
np.dot(u, v), np.linalg.norm(u) * np.linalg.norm(v), dtype=float
)


@classmethod
def is_available(cls) -> bool:
return importlib.util.find_spec("litellm") is not None

def __repr__(self) -> str:
return f"LiteLLMEmbeddings(model={self.model})"
Loading