๐จ AI Creative Director System - Intelligent Poster Generation Powered by Multi-Agent Architecture
PosterAgent is an AI Agent-based intelligent poster generation system that goes beyond simple text-to-image generation. It understands user intent, retrieves brand and design knowledge, and iteratively optimizes outputs based on feedback.
| Feature | Description |
|---|---|
| Natural Language Understanding | Parse complex user requests and extract structured requirements |
| Auto-completion | Intelligent follow-up questions for incomplete requirements |
| Brand Knowledge Retrieval | LLM-driven brand tone and visual style synthesis (vector RAG planned) |
| Design Knowledge Base | LLM-driven design principles synthesis (corpus-backed RAG planned) |
| Prompt Engineering | Auto-generated layered prompts (base, style, lighting, composition) |
| Feedback Iteration | Human-in-the-loop optimization for continuous improvement |
flowchart TB
subgraph Frontend["Frontend Layer"]
direction LR
Web[Web Interface]
App[Mobile App]
Chat[Chat UI]
end
subgraph Gateway["API Gateway"]
API[REST / WebSocket]
end
subgraph Orchestrator["Agent Orchestrator"]
direction LR
LG[LangGraph]
end
subgraph Agents["Agent Layer"]
direction LR
Intent[Intent Agent]
Retrieval[Retrieval Agent]
Prompt[Prompt Agent]
Critic[Image Critic Agent]
end
subgraph Services["Backend Services"]
direction LR
LLM[LLM Models]
RAG[Vector DB + RAG]
Image[Image Models]
end
Frontend --> Gateway
Gateway --> Orchestrator
Orchestrator --> Intent
Orchestrator --> Retrieval
Orchestrator --> Prompt
Orchestrator --> Critic
Intent --> LLM
Retrieval --> RAG
Prompt --> LLM
Prompt --> Image
Critic --> LLM
PosterAgent features a dual-loop architecture for robust poster generation:
flowchart TD
A[User Input] --> B[Intent Recognition]
B --> C{Valid Poster Request?}
C -->|No| Z[Reject / Redirect]
C -->|Yes| D[Requirement Extraction]
D --> E[Completeness Check]
E --> F{Information Complete?}
F -->|No| G[Ask Follow-up Questions]
G --> H[User Provides Additional Info]
H --> E
F -->|Yes| I[Structured Requirement JSON]
I --> J[Knowledge Retrieval]
J --> K[Brand / Product / Design Knowledge Fusion]
K --> L[Prompt Generation]
L --> M[Image Generation]
M --> N[User Review]
N --> O{Satisfied?}
O -->|No| P[Parse Feedback]
P --> Q[Refine Prompt]
Q --> M
O -->|Yes| R[Final Poster Output]
style E fill:#f9f,stroke:#333,stroke-width:2px
style F fill:#f9f,stroke:#333,stroke-width:2px
style N fill:#9f9,stroke:#333,stroke-width:2px
style O fill:#9f9,stroke:#333,stroke-width:2px
stateDiagram-v2
[*] --> Input
Input --> Extract
Extract --> Check
Check --> Incomplete
Incomplete --> Ask
Ask --> Input
Check --> Complete
Complete --> [*]
stateDiagram-v2
[*] --> Generate
Generate --> Review
Review --> Feedback
Feedback --> Refine
Refine --> Generate
Review --> Satisfied
Satisfied --> [*]
| Layer | Choice |
|---|---|
| Orchestration | LangGraph (single state graph, dispatch entry node for HTTP pause/resume) |
| LLM Gateway | LiteLLM (provider-agnostic; DeepSeek / Anthropic / OpenAI / Google) |
| Default LLM | deepseek/deepseek-chat (override via DEFAULT_LLM_MODEL) |
| Image Generation | Volcengine Ark โ Doubao Seedream (doubao-seedream-5-0-260128), with local PNG cache |
| API | FastAPI + Pydantic v2 |
| Sessions | In-memory store with per-session asyncio.Lock |
| Knowledge Agent | LLM-only synthesis (no vector DB / corpus retrieval in this MVP) |
- Orchestration alternatives: Mastra, CrewAI
- Additional image providers: Ideogram (adapter shipped but disabled by default), FLUX, SDXL, Midjourney
- RAG infrastructure: OpenAI / Voyage / Jina embeddings + Pinecone / Milvus / Weaviate
{
"brand": "Apple",
"product": "iPhone 17 Pro",
"poster_goal": "ๆฐๅๅๅธ",
"style": "ๆชๆฅ็งๆๆ",
"tone": "้ซ็บง",
"language": "ไธญๆ",
"size": "1024x1536",
"target_audience": "ๅนด่ฝป็งๆ็จๆท",
"slogan": "้ๆฐๅฎไน้ๅบฆ"
}{
"modify": {
"lighting": "less blue glow",
"tone": "more premium",
"composition": "more whitespace"
}
}stateDiagram-v2
[*] --> INPUT
INPUT --> REQUIREMENT_LOOP
REQUIREMENT_LOOP --> RETRIEVAL
RETRIEVAL --> PROMPT_GENERATION
PROMPT_GENERATION --> IMAGE_GENERATION
IMAGE_GENERATION --> FEEDBACK_LOOP
FEEDBACK_LOOP --> FINISHED
FINISHED --> [*]
Agent โ One-shot Generation
PosterAgent is a pipeline:
Understand โ Complete โ Retrieve โ Plan โ Generate โ Reflect โ Iterate
LLM + Workflow + Human Feedback + Generative Models (+ RAG, planned)
= AI Creative Director System
- Python 3.10+
- A DeepSeek API key (or another LLM provider supported by LiteLLM)
- A Volcengine Ark API key for image generation
# Clone the repository
git clone https://github.com/yourusername/poster-agent.git
cd poster-agent
# Create a virtual environment and install
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
# Configure environment
cp .env.example .env
# Edit .env: set DEEPSEEK_API_KEY and ARK_API_KEY (or swap providers)# Terminal 1 โ start the server
uvicorn poster_agent.main:app --reload
# Terminal 2 โ interactive CLI
python scripts/cli.pyThen chat naturally:
you> make me an Apple iPhone 17 Pro launch poster, futuristic style, English
agent> [may ask follow-up questions to complete requirements]
you> [answer the questions]
agent> image ready at http://localhost:8000/sessions/.../images/0 (iteration 0)
you> less blue, more whitespace
agent> image ready at http://localhost:8000/sessions/.../images/1 (iteration 1)
you> perfect, use this
agent> Done. Final poster delivered.
Generated PNGs are cached under .cache/images/{session_id}/{iteration}.png.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness probe; returns model + image provider |
POST |
/sessions |
Create a new session, returns { session_id } |
GET |
/sessions/{session_id} |
Inspect full PosterState |
POST |
/sessions/{session_id}/messages |
Send a user turn; returns status + assistant message + image URL |
GET |
/sessions/{session_id}/images/{iteration} |
Stream the cached PNG for that iteration |
POST /sessions/{id}/messages response statuses:
awaiting_userโ agent asked a follow-up questionimage_readyโ a new poster iteration is available atimage_urldoneโ user accepted the poster; session terminalrejectedโ request was not a poster askimage_failedโ provider error on this turn (you can retry)
- Product design document
- Core agent implementations (intent, requirement extraction/QA, follow-up, knowledge, prompt engineer, image generation, feedback parse, image critic stub)
- LangGraph dual-loop orchestrator with HTTP pause/resume
- FastAPI backend + interactive CLI
- Local image cache
- Persistent session store (replace in-memory dict)
- Real RAG knowledge base (brand / design principles)
- Web interface
- Multimodal image critic (wire
image_criticinto the graph) - Additional image providers (FLUX, SDXL, Midjourney)
- Auto A/B testing
- Marketing analytics integration
Contributions are welcome! Please read our contributing guidelines before submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with โค๏ธ for AI-powered creative design