This Mermaid.js diagram shows the simplified MVP workflow for the Security Agent system as described in the README.md.
graph TB
%% User Types
M[Miner Script<br/>🐍 Python CLI<br/>📝 Takes task prompt<br/>🤖 Uses LLM for code generation<br/>📤 Uploads agent.py]
V[Validator Script<br/>🔍 Core evaluator<br/>📊 Runs manually/cron<br/>🐳 Manages Docker containers<br/>📈 Scores & ranks agents]
%% Core Components
P[Platform API<br/>🌐 FastAPI Central Hub<br/>📤 /upload/agent POST<br/>📋 /tasks GET<br/>💾 Local file storage]
DOCKER[Docker Sandbox<br/>🐳 Code execution<br/>⏱️ 10s timeout limit<br/>🛡️ Resource constraints<br/>📦 python:3.11-slim]
LLM[LLM Services<br/>🧠 Code generation<br/>📝 "Write Python code for [task]"<br/>💭 AI assistance]
%% Tasks & Storage
TASKS[Hardcoded Tasks<br/>📋 reverse_string function<br/>🔧 Simple coding challenges<br/>✅ Test assertions]
STORAGE[Local Storage<br/>📁 agent.py files<br/>🏷️ miner_id as filename<br/>📊 Metadata tracking]
%% Workflow Connections
M -->|1. Takes task prompt| TASKS
M -->|2. Generates code| LLM
LLM -->|3. Returns agent.py| M
M -->|4. Uploads agent.py| P
P -->|5. Validates Python/stdlib| P
P -->|6. Stores with metadata| STORAGE
%% Validation Flow
V -->|7. Pulls all agents| STORAGE
V -->|8. Gets task list| P
V -->|9. For each agent + task| V
V -->|10. Spins up container| DOCKER
DOCKER -->|11. Copies agent.py| DOCKER
DOCKER -->|12. Runs python agent.py --task| DOCKER
DOCKER -->|13. Captures output| V
V -->|14. Runs test assertions| V
V -->|15. Scores 0-100| V
V -->|16. Outputs rankings| V
%% Demo Workflow
M -->|17. Generate 2-3 agents<br/>(good & bad examples)| M
V -->|18. Run evaluation| V
V -->|19. Show scores like<br/>"Agent1: 100 (reward: 50 'TAO')"| V
%% Styling
classDef userType fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef coreComponent fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef external fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef storage fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
class M,V userType
class P coreComponent
class DOCKER,LLM external
class TASKS,STORAGE storage
Scope: End-to-end demo: Submit agent → Evaluate on task → Score and "reward" (just print rankings)
- ❌ No Bittensor components
- ❌ No database
- ❌ No logging
- ❌ No/limited website
- ✅ Docker sandbox execution
- ✅ Simple agent submission
- ✅ Basic scoring system
- Miner takes a task prompt
- LLM generates Python code for the task
- Miner uploads
agent.pyto Platform API - Platform API validates it's Python with stdlib only
- Platform API stores with
miner_idas filename
- Validator pulls all submitted agents from storage
- Validator gets task list from Platform API
- Validator spins up Docker container for each agent
- Docker copies
agent.pyand runspython agent.py --task [prompt] - Validator captures output and runs test assertions
- Validator scores 0-100 based on tests passed
- Validator outputs rankings to console/JSON
- Miner generates 2-3 sample agents (one good, one bad)
- Validator runs evaluation
- Validator shows scores like "Agent1: 100 (reward: 50 'TAO')"
- POST /upload/agent: Accepts single
agent.pyfile upload- Validates it's Python
- Checks uses only stdlib (regex check)
- Stores in local folder with metadata
- GET /tasks: Returns hardcoded coding tasks
- Example: "Implement a function reverse_string(s: str) -> str"
docker run -v [host_dir]:/app python:3.11-slim python /app/agent.py- Resource limits: CPU/time to prevent hangs
- Timeout: 10 seconds maximum
- Error handling: Graceful timeout handling
- Security: Isolated execution environment
- 0-100 scale based on tests passed
- 100 points if all tests pass
- Console/JSON output for rankings
- Simulated rewards (just printed, no real TAO)
- Docker isolation for code execution
- Resource limits to prevent system abuse
- Timeout handling for graceful error recovery
- Python stdlib validation for security