High-performance LLM API gateway in Rust, built by crabtalk.
Crabllm sits between your application and LLM providers. It exposes an OpenAI-compatible API and routes requests to the configured provider — OpenAI, Anthropic, Azure, Ollama, and any OpenAI-compatible service.
One API format. Many providers. Low overhead.
Inspired by LiteLLM. Built in Rust for minimal overhead. See the docs for providers, routing, extensions, and configuration.
cargo install crabllmCreate crabllm.toml:
listen = "0.0.0.0:8080"
[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]
[providers.anthropic]
kind = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
models = ["claude-sonnet-4-20250514"]Run:
crabllm --config crabllm.tomlSend requests using the OpenAI format:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'- Performance: Sub-millisecond proxy overhead. No GC pauses.
- Safety: Memory safety without runtime cost.
- Concurrency: Tokio async runtime handles thousands of concurrent streaming connections.
- Deployment: Single static binary. No interpreter, no virtualenv.
Gateway overhead measured against a mock LLM server with instant responses. Numbers show proxy cost only (gateway latency minus direct baseline). See full results for all scenarios and memory usage.
Streaming P50 overhead (ms) — the metric that matters for LLMs:
| RPS | crabllm | bifrost | litellm |
|---|---|---|---|
| 500 | +0.02 | +0.16 | +593.20 |
| 1000 | +0.09 | +0.18 | +596.90 |
| 5000 | +0.23 | +0.47 | +593.85 |
Chat completions P50 overhead (ms):
| RPS | crabllm | bifrost | litellm |
|---|---|---|---|
| 500 | +0.41 | +0.07 | +151.42 |
| 1000 | +0.30 | +0.10 | +160.12 |
| 5000 | +0.16 | +0.14 | +159.68 |
Peak memory: crabllm 37MB · bifrost 169MB · litellm 541MB
# requires linux + docker
make bench # full competitive benchmark
make bench-debug GW=crabllm # quick single-gateway debug run
make bench-chart # render terminal charts from results
make bench-json # dump summary JSON to stdout
make summary # generate docs/src/benchmarks.md| Crate | Description |
|---|---|
crabllm |
Binary entry point (CLI + server startup) |
crabllm-core |
Shared types: config, OpenAI-format request/response, errors |
crabllm-provider |
Provider enum, registry, and upstream HTTP dispatch |
crabllm-proxy |
Axum HTTP server, route handlers, auth middleware |
MIT OR Apache-2.0