The Operating System for AI Agents
Build, Test, Deploy, Monitor, and Govern AI agents — from prototype to production.
🌐 Live Demo · 🚀 Quick Start · 📋 Issues
For teams who need to deploy AI agents with testing, governance, and monitoring built in — not bolted on.
- 🧪 Test: Run scenario-based simulation before deploy, with quality and cost scoring.
- 🛡️ Govern: Enforce budgets, permissions, and kill-switch policies with auditability.
- 📊 Monitor: Observe live agent runs, tool usage, latency, and spend in one dashboard.
pip install agentos-platform10-line example:
from agentos.governed_agent import GovernedAgent
from agentos.core.tool import tool
@tool(description="Add two numbers")
def add(a: float, b: float) -> float:
return a + b
agent = GovernedAgent(name="demo", model="gpt-4o-mini", tools=[add])
print(agent.run("What is 12.5 + 7.5?"))Demo mode:
AGENTOS_DEMO_MODE=true python examples/run_web_builder.pyInstall the MCP extra:
pip install 'agentos-platform[mcp]'Expose built-in AgentOS tools (stdio transport is the safest choice for MCP clients like Claude Desktop and Cursor):
agentos mcp serve --transport stdioExpose tools from a specific agent module (example ./my_agent/agent.py):
agentos mcp serve --transport stdio --agent ./my_agentOptional: run the HTTP SSE transport for clients that support it:
agentos mcp serve --transport sse --host 127.0.0.1 --port 8080Add the following snippet to your claude_desktop_config.json (restart Claude Desktop after editing):
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}If you want a specific agent module:
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio", "--agent", "/absolute/path/to/agent.py"]
}
}
}Add to Cursor .cursor/mcp.json:
{
"mcpServers": {
"agentos": {
"command": "agentos",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}AgentOS includes a structured delegation system that lets a “parent” agent offload subtasks to “child” agents while propagating rich context through a shared, in-memory key/value store.
Key pieces:
delegate_subtasktool: LLM-facing tool that accepts structured fields liketask,context_json,constraints_json,expected_output_schema_json, andtimeout.SharedContext: a key/value store child agents can read/write during the delegation chain (avoids lossy prompt compression).- Delegation chaining: if a child agent delegates again, the same shared context key is reused automatically.
Minimal wiring example:
from agentos.core.agent import Agent
from agentos.core.delegation import DelegationManager
# Define your child agents however you like.
child_agent_a = Agent(name="child-a", model="gpt-4o-mini", tools=[])
child_agent_b = Agent(name="child-b", model="gpt-4o-mini", tools=[])
manager = DelegationManager()
manager.register_agent("child-a", child_agent_a)
manager.register_agent("child-b", child_agent_b)
# Create your parent agent and attach the delegate tool.
parent = Agent(name="parent", model="gpt-4o-mini", tools=[])
manager.attach_delegate_tool(parent) # adds `delegate_subtask` to the toolset
# Now the parent agent can call `delegate_subtask`.
parent.run("Delegate a subtask and use shared context for details.")SharedContext tools available to delegated agents:
shared_context_key()shared_context_get(key)shared_context_set(key, value_json)shared_context_dump()
| Module | What it does |
|---|---|
| Agent SDK | Define agents and tools with provider-agnostic model routing |
| Simulation Sandbox | Test scenarios with LLM-as-judge quality and pass/fail scoring |
| Governance Engine | Budget controls, permissions, kill switch, and audit logging |
| Live Dashboard | Real-time traces for prompts, tool calls, latency, and spend |
| RAG Pipeline | Ingest, chunk, embed, and retrieve knowledge sources |
| Workflow Engine | Compose repeatable multi-step agent workflows |
📋 Full 15-module list (click to expand)
| Module | Description |
|---|---|
| Agent SDK | Core governed agent runtime and tool-calling loop |
| WebSocket Streaming | Token streaming and low-latency interactive sessions |
| RAG Pipeline | Ingestion, chunking, embeddings, retrieval, and reranking |
| Simulation Sandbox | Scenario simulation, scoring, and comparison reports |
| Live Dashboard | Event stream, usage analytics, and operational visibility |
| Governance Engine | Guardrails, budget caps, permission checks, and audits |
| Agent Scheduler | Interval and cron scheduling with execution history |
| Event Bus | Trigger-driven orchestration via internal and external events |
| Plugin System | Runtime-extensible tools, providers, and adapters |
| Authentication | API key auth, org and user usage tracking, and middleware |
| A/B Testing | Side-by-side evaluation for variants and prompt changes |
| Workflow Engine | DAG-based execution with retries and branching |
| Multimodal | Vision and document flows for image and file-aware agents |
| Marketplace | Template registry for reusable agents and workflows |
| Embed SDK | Embeddable widget and integration surface for web apps |
| Capability | AgentOS | LangChain | CrewAI | AutoGen |
|---|---|---|---|---|
| Built-in testing sandbox | ✅ Native | ❌ External setup | ❌ External setup | ❌ External setup |
| Governance (budget/kill switch) | ✅ Native | |||
| Real-time ops dashboard | ✅ Native | ❌ | ❌ | |
| Batteries-included platform | ✅ Yes | |||
| Ecosystem maturity | 🌱 Growing | ✅ Very mature | ✅ Mature | ✅ Mature |
See full benchmark results. Key findings:
- Our weighted evaluation ensemble correlates 0.91 with human judgment
- Local embeddings achieve 95% of OpenAI quality at zero cost
- Governance adds <5ms overhead to any query
See the architecture diagram above and docs/ for component-level details and ADRs.
agentos/
├── src/agentos/ # Core platform modules
├── frontend/ # React frontend
├── dashboard/ # Web dashboard UI
├── deploy/helm/ # Helm charts
├── examples/ # Runnable examples
├── tests/ # Unit and integration tests
└── docs/ # Docs and ADRs
Contributions are welcome: CONTRIBUTING.md
Roadmap and upcoming work are tracked in GitHub Issues.
- Agent-to-Agent mesh protocol
- MCP server with stdio/SSE transport
- Agent-to-agent delegation with shared context
