llm-cost-chain 💰 — 6-tier fallback across Azure, Bedrock, Vertex, Foundry, Agent SDK, and Anthropic — cut costs 88%.
A 6-tier LLM routing system that cut daily AI costs from $25 to $3 (88% reduction). Routes requests through Azure OpenAI, Anthropic Agent SDK, AWS Bedrock, Azure AI Foundry, Vertex AI, and Anthropic direct, falling back automatically on errors, rate limits, or quota exhaustion.
This is a closed-source project. The README documents the architecture and learnings.
- Routes LLM requests through 6 providers in priority order
- Automatic fallback on errors, rate limits, or quota exhaustion
- Cut daily LLM spending from $25/day to $3/day (88% reduction)
- Tracks cost per request, per tier, per day
- Azure credits ($200/month) consumed first before paid tiers
- Fire-and-forget observability via Langfuse (zero blocking overhead)
Request enters chain
→ Tier 0: Azure OpenAI (free credits)
→ Tier 1: Anthropic Agent SDK
→ Tier 2: AWS Bedrock
→ Tier 2.5: Azure AI Foundry
→ Tier 3: Google Vertex AI
→ Tier 4: Anthropic Direct (most expensive)
Each tier:
Try request → Check response quality
→ If error/rate-limit/timeout → Log to Langfuse → Fall to next tier
- Runtime: Node.js
- Providers: Azure OpenAI, Anthropic Agent SDK, AWS Bedrock, Azure AI Foundry, Google Vertex AI, Anthropic Direct
- Observability: Self-hosted Langfuse (fire-and-forget ingestion)
- Database: PostgreSQL for cost tracking and tier analytics
- Infra: AWS Lightsail
- The biggest cost savings came from tier ordering, not from switching models — Azure's free credits handle 60% of requests
- Fire-and-forget logging was essential — blocking on observability added 200ms latency that cascaded across all scrapers
- Rate limit detection needs to be provider-specific — Azure returns 429, Bedrock returns ThrottlingException, Vertex returns RESOURCE_EXHAUSTED
