Rust control plane for HTTP, model, tool, and agent traffic.
Ferrum Gateway is a standalone service gateway with a generic HTTP core and optional agent-native modules. A normal Web2 service can use it for routing, auth, rate limits, body limits, timeouts, retries, circuit breakers, streaming proxying, JSON logs, tracing, and Prometheus metrics. Agent workloads can add LLM provider routing, token budgets, usage metering, MCP/tool governance, and Ferrum adapters on top.
The core gateway does not depend on LLM, MCP, or Ferrum crates. That boundary is intentional: Ferrum integrations are adapters, not prerequisites.
Those are stronger general-purpose gateways today. Ferrum Gateway is narrower: it is a portfolio-grade Rust gateway that treats agent traffic as a first-class operations problem. The generic core keeps it credible infrastructure; the agent layer adds controls that are awkward to bolt onto a classic gateway:
- policy-driven LLM provider fallback with streaming preserved;
- per-key, per-session, and per-run token budgets;
- OpenAI-compatible usage extraction and tokengate-shaped events;
- MCP / JSON-RPC tool policy with allow, deny, approval-required, and path guards;
- one correlated run trail across HTTP, model, and tool calls.
Phase 10 is a presentable public slice:
- Pingora-backed data plane plus axum admin plane.
- Static TOML config with strict validation.
- Local mocks for HTTP, LLM, MCP, and tokengate-compatible metering.
- End-to-end tests for auth, rate limits, body limits, streaming, timeouts, retries, circuit breakers, LLM fallback, budgets, metering, and MCP policy.
- Dockerfile, Docker Compose demo, CI workflow, and a local load-test script.
Normal verification is local and deterministic. No external providers, network services, or secrets are required.
Prerequisites:
- Rust stable.
cmakefor Pingora's zlib-ng build path (brew install cmakeon macOS).
Run the generic HTTP gateway in two shells:
# shell 1
cargo run -p mock-upstream
# shell 2
cargo run -p ferrum-gateway-server -- --config config/gateway.example.tomlExercise it:
curl -i http://127.0.0.1:8081/healthz
curl -i http://127.0.0.1:8080/api/echo
curl -N "http://127.0.0.1:8080/sse?count=5&interval_ms=80"
curl -s http://127.0.0.1:8081/metrics | grep gateway_Run the full mock-first agent demo:
bash examples/ferrum-demo/run.shThat script starts mock LLM primary/fallback providers, mock tokengate, mock MCP, and the gateway. It sends one correlated run through model routing, usage metering, MCP allow/approval/protected-path decisions, and prints a compact run trail.
Build and run the gateway plus all local mocks:
docker compose up --buildThen, in another shell:
curl -i http://127.0.0.1:8081/healthz
curl -i http://127.0.0.1:8080/api/echo
curl -sS -X POST http://127.0.0.1:8080/v1/chat/completions \
-H 'content-type: application/json' \
-H 'x-ferrum-run-id: compose-run-1' \
-d '{"model":"axon-sim","messages":[{"role":"user","content":"hello"}]}'
curl -sS -X POST http://127.0.0.1:8080/mcp \
-H 'content-type: application/json' \
-H 'x-ferrum-run-id: compose-run-1' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"coding/read_file","arguments":{"path":"src/lib.rs"}}}'The Compose config is generated inside the gateway container so checked-in
local configs can keep their 127.0.0.1 development defaults.
The bundled smoke load test needs only bash, cargo, and curl:
bash load-tests/http-smoke.shTune it with environment variables:
FGW_LOAD_REQUESTS=500 \
FGW_LOAD_CONCURRENCY=25 \
bash load-tests/http-smoke.shThe script starts mock-upstream and the gateway, waits for health, sends
concurrent requests to /api/echo, prints status counts and duration, then
cleans up its child processes.
Generic gateway core:
- path, method, host, and header route matching;
- upstream pools with round-robin selection;
- API-key auth with sensitive header redaction;
- token-bucket rate limits;
- request body size limits;
- streaming-safe SSE and chunked proxying;
- per-route timeouts;
- bounded retries for idempotent, non-streaming requests;
- passive circuit breakers per
(route, upstream); - Prometheus metrics and structured logs.
Agent-native modules:
- OpenAI-compatible chat completions proxy;
- primary/fallback provider policies;
- streaming and non-streaming usage extraction;
- run/session/API-key token budgets;
- tokengate-compatible usage sink;
- MCP / JSON-RPC inspection and forwarding;
- tool allow, deny, approval-required, and protected-path policy.
Dependency direction is one-way:
ferrum-gateway-server -> ferrum-gateway-core
ferrum-gateway-llm -> ferrum-gateway-core
ferrum-gateway-mcp -> ferrum-gateway-core
ferrum-gateway-ferrum -> ferrum-gateway-core + optional adapter surfaces
ferrum-gateway-core declares zero dependencies on LLM, MCP, Ferrum, or any
other ferrum-gateway-* crate.
crates/ferrum-gateway-core/ generic config, routing, limits, metrics, events
crates/ferrum-gateway-server/ Pingora data plane and axum admin plane
crates/ferrum-gateway-llm/ provider policies, budgets, usage extraction
crates/ferrum-gateway-mcp/ JSON-RPC inspection and tool governance
crates/ferrum-gateway-ferrum/ optional Ferrum-stack adapters
examples/ local mocks and the Phase 9 demo script
docs/ architecture, config, agent control, failures
load-tests/ local smoke load tests
cargo fmt --all
cargo test --workspace
cargo clippy --workspace --all-targets --no-deps
cargo run -p ferrum-gateway-server -- --config config/gateway.example.toml --validate-only
cargo run -p ferrum-gateway-server -- --config config/ferrum-demo.local.toml --validate-only
docker build -t ferrum-gateway:local .
docker compose up --buildStrict clippy -D warnings currently reports cleanup lints in existing Rust
source; the Phase 10 CI runs clippy without turning warnings into build
failures.
- A full Envoy replacement.
- A dynamic plugin runtime.
- A UI dashboard.
- A database-backed control plane.
- A Kubernetes operator.
- A complete MCP server.
- A billing platform.
Apache-2.0.