Skip to content

kevinastuhuaman/llm-cost-chain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

llm-cost-chain 💰 — 6-tier fallback across Azure, Bedrock, Vertex, Foundry, Agent SDK, and Anthropic — cut costs 88%.

LLM Cost Chain hero

TypeScript Node.js Azure AWS Bedrock Claude

A 6-tier LLM routing system that cut daily AI costs from $25 to $3 (88% reduction). Routes requests through Azure OpenAI, Anthropic Agent SDK, AWS Bedrock, Azure AI Foundry, Vertex AI, and Anthropic direct, falling back automatically on errors, rate limits, or quota exhaustion.

This is a closed-source project. The README documents the architecture and learnings.

What it does

  • Routes LLM requests through 6 providers in priority order
  • Automatic fallback on errors, rate limits, or quota exhaustion
  • Cut daily LLM spending from $25/day to $3/day (88% reduction)
  • Tracks cost per request, per tier, per day
  • Azure credits ($200/month) consumed first before paid tiers
  • Fire-and-forget observability via Langfuse (zero blocking overhead)

How it works

Request enters chain
  → Tier 0: Azure OpenAI (free credits)
  → Tier 1: Anthropic Agent SDK
  → Tier 2: AWS Bedrock
  → Tier 2.5: Azure AI Foundry
  → Tier 3: Google Vertex AI
  → Tier 4: Anthropic Direct (most expensive)

Each tier:
  Try request → Check response quality
  → If error/rate-limit/timeout → Log to Langfuse → Fall to next tier

Tech stack

  • Runtime: Node.js
  • Providers: Azure OpenAI, Anthropic Agent SDK, AWS Bedrock, Azure AI Foundry, Google Vertex AI, Anthropic Direct
  • Observability: Self-hosted Langfuse (fire-and-forget ingestion)
  • Database: PostgreSQL for cost tracking and tier analytics
  • Infra: AWS Lightsail

What I learned

  • The biggest cost savings came from tier ordering, not from switching models — Azure's free credits handle 60% of requests
  • Fire-and-forget logging was essential — blocking on observability added 200ms latency that cascaded across all scrapers
  • Rate limit detection needs to be provider-specific — Azure returns 429, Bedrock returns ThrottlingException, Vertex returns RESOURCE_EXHAUSTED

About

6-tier fallback across Azure, Bedrock, Vertex, Foundry, Agent SDK, and Anthropic — cut costs 88%

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors