Skip to content

awspace/poster-agent

Repository files navigation

PosterAgent

๐ŸŽจ AI Creative Director System - Intelligent Poster Generation Powered by Multi-Agent Architecture

PosterAgent is an AI Agent-based intelligent poster generation system that goes beyond simple text-to-image generation. It understands user intent, retrieves brand and design knowledge, and iteratively optimizes outputs based on feedback.

โœจ Core Features

Feature Description
Natural Language Understanding Parse complex user requests and extract structured requirements
Auto-completion Intelligent follow-up questions for incomplete requirements
Brand Knowledge Retrieval LLM-driven brand tone and visual style synthesis (vector RAG planned)
Design Knowledge Base LLM-driven design principles synthesis (corpus-backed RAG planned)
Prompt Engineering Auto-generated layered prompts (base, style, lighting, composition)
Feedback Iteration Human-in-the-loop optimization for continuous improvement

๐Ÿ—๏ธ System Architecture

flowchart TB
    subgraph Frontend["Frontend Layer"]
        direction LR
        Web[Web Interface]
        App[Mobile App]
        Chat[Chat UI]
    end

    subgraph Gateway["API Gateway"]
        API[REST / WebSocket]
    end

    subgraph Orchestrator["Agent Orchestrator"]
        direction LR
        LG[LangGraph]
    end

    subgraph Agents["Agent Layer"]
        direction LR
        Intent[Intent Agent]
        Retrieval[Retrieval Agent]
        Prompt[Prompt Agent]
        Critic[Image Critic Agent]
    end

    subgraph Services["Backend Services"]
        direction LR
        LLM[LLM Models]
        RAG[Vector DB + RAG]
        Image[Image Models]
    end

    Frontend --> Gateway
    Gateway --> Orchestrator
    Orchestrator --> Intent
    Orchestrator --> Retrieval
    Orchestrator --> Prompt
    Orchestrator --> Critic

    Intent --> LLM
    Retrieval --> RAG
    Prompt --> LLM
    Prompt --> Image
    Critic --> LLM
Loading

๐Ÿ”„ Core Workflow

PosterAgent features a dual-loop architecture for robust poster generation:

flowchart TD
    A[User Input] --> B[Intent Recognition]
    B --> C{Valid Poster Request?}
    C -->|No| Z[Reject / Redirect]
    C -->|Yes| D[Requirement Extraction]

    D --> E[Completeness Check]
    E --> F{Information Complete?}

    F -->|No| G[Ask Follow-up Questions]
    G --> H[User Provides Additional Info]
    H --> E

    F -->|Yes| I[Structured Requirement JSON]
    I --> J[Knowledge Retrieval]
    J --> K[Brand / Product / Design Knowledge Fusion]
    K --> L[Prompt Generation]
    L --> M[Image Generation]
    M --> N[User Review]

    N --> O{Satisfied?}
    O -->|No| P[Parse Feedback]
    P --> Q[Refine Prompt]
    Q --> M

    O -->|Yes| R[Final Poster Output]

    style E fill:#f9f,stroke:#333,stroke-width:2px
    style F fill:#f9f,stroke:#333,stroke-width:2px
    style N fill:#9f9,stroke:#333,stroke-width:2px
    style O fill:#9f9,stroke:#333,stroke-width:2px
Loading

๐Ÿ”Œ Agent Modules

Loop 1: Requirement Completion Loop

stateDiagram-v2
    [*] --> Input
    Input --> Extract
    Extract --> Check
    Check --> Incomplete
    Incomplete --> Ask
    Ask --> Input
    Check --> Complete
    Complete --> [*]
Loading

Loop 2: Generation Optimization Loop

stateDiagram-v2
    [*] --> Generate
    Generate --> Review
    Review --> Feedback
    Feedback --> Refine
    Refine --> Generate
    Review --> Satisfied
    Satisfied --> [*]
Loading

๐Ÿ› ๏ธ Tech Stack

Current MVP

Layer Choice
Orchestration LangGraph (single state graph, dispatch entry node for HTTP pause/resume)
LLM Gateway LiteLLM (provider-agnostic; DeepSeek / Anthropic / OpenAI / Google)
Default LLM deepseek/deepseek-chat (override via DEFAULT_LLM_MODEL)
Image Generation Volcengine Ark โ€” Doubao Seedream (doubao-seedream-5-0-260128), with local PNG cache
API FastAPI + Pydantic v2
Sessions In-memory store with per-session asyncio.Lock
Knowledge Agent LLM-only synthesis (no vector DB / corpus retrieval in this MVP)

Considered for Later Phases

  • Orchestration alternatives: Mastra, CrewAI
  • Additional image providers: Ideogram (adapter shipped but disabled by default), FLUX, SDXL, Midjourney
  • RAG infrastructure: OpenAI / Voyage / Jina embeddings + Pinecone / Milvus / Weaviate

๐Ÿ“Š Data Structures

Requirement Schema

{
  "brand": "Apple",
  "product": "iPhone 17 Pro",
  "poster_goal": "ๆ–ฐๅ“ๅ‘ๅธƒ",
  "style": "ๆœชๆฅ็ง‘ๆŠ€ๆ„Ÿ",
  "tone": "้ซ˜็บง",
  "language": "ไธญๆ–‡",
  "size": "1024x1536",
  "target_audience": "ๅนด่ฝป็ง‘ๆŠ€็”จๆˆท",
  "slogan": "้‡ๆ–ฐๅฎšไน‰้€Ÿๅบฆ"
}

Feedback Schema

{
  "modify": {
    "lighting": "less blue glow",
    "tone": "more premium",
    "composition": "more whitespace"
  }
}

State Machine

stateDiagram-v2
    [*] --> INPUT
    INPUT --> REQUIREMENT_LOOP
    REQUIREMENT_LOOP --> RETRIEVAL
    RETRIEVAL --> PROMPT_GENERATION
    PROMPT_GENERATION --> IMAGE_GENERATION
    IMAGE_GENERATION --> FEEDBACK_LOOP
    FEEDBACK_LOOP --> FINISHED
    FINISHED --> [*]
Loading

๐Ÿš€ System Philosophy

The Agent Difference

Agent โ‰  One-shot Generation

PosterAgent is a pipeline:

Understand โ†’ Complete โ†’ Retrieve โ†’ Plan โ†’ Generate โ†’ Reflect โ†’ Iterate

Core Essence

LLM + Workflow + Human Feedback + Generative Models  (+ RAG, planned)
= AI Creative Director System

๐Ÿšง Getting Started

Prerequisites

  • Python 3.10+
  • A DeepSeek API key (or another LLM provider supported by LiteLLM)
  • A Volcengine Ark API key for image generation

Installation

# Clone the repository
git clone https://github.com/yourusername/poster-agent.git
cd poster-agent

# Create a virtual environment and install
python -m venv .venv
source .venv/bin/activate         # Windows: .venv\Scripts\activate
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env: set DEEPSEEK_API_KEY and ARK_API_KEY (or swap providers)

Run

# Terminal 1 โ€” start the server
uvicorn poster_agent.main:app --reload

# Terminal 2 โ€” interactive CLI
python scripts/cli.py

Then chat naturally:

you> make me an Apple iPhone 17 Pro launch poster, futuristic style, English
agent> [may ask follow-up questions to complete requirements]
you> [answer the questions]
agent> image ready at http://localhost:8000/sessions/.../images/0 (iteration 0)
you> less blue, more whitespace
agent> image ready at http://localhost:8000/sessions/.../images/1 (iteration 1)
you> perfect, use this
agent> Done. Final poster delivered.

Generated PNGs are cached under .cache/images/{session_id}/{iteration}.png.

๐ŸŒ HTTP API

Method Path Description
GET /health Liveness probe; returns model + image provider
POST /sessions Create a new session, returns { session_id }
GET /sessions/{session_id} Inspect full PosterState
POST /sessions/{session_id}/messages Send a user turn; returns status + assistant message + image URL
GET /sessions/{session_id}/images/{iteration} Stream the cached PNG for that iteration

POST /sessions/{id}/messages response statuses:

  • awaiting_user โ€” agent asked a follow-up question
  • image_ready โ€” a new poster iteration is available at image_url
  • done โ€” user accepted the poster; session terminal
  • rejected โ€” request was not a poster ask
  • image_failed โ€” provider error on this turn (you can retry)

๐Ÿ—บ๏ธ Roadmap

  • Product design document
  • Core agent implementations (intent, requirement extraction/QA, follow-up, knowledge, prompt engineer, image generation, feedback parse, image critic stub)
  • LangGraph dual-loop orchestrator with HTTP pause/resume
  • FastAPI backend + interactive CLI
  • Local image cache
  • Persistent session store (replace in-memory dict)
  • Real RAG knowledge base (brand / design principles)
  • Web interface
  • Multimodal image critic (wire image_critic into the graph)
  • Additional image providers (FLUX, SDXL, Midjourney)
  • Auto A/B testing
  • Marketing analytics integration

๐Ÿค Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with โค๏ธ for AI-powered creative design

About

AI Creative Director System - Intelligent Poster Generation Powered by Multi-Agent Architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages