![]() |
|
- Mar. 18, 2024. Release the inference code.
- Mar. 10, 2025. Rep initialization (No code).
Existing long-form video generation frameworks lack automated planning and often rely on manual intervention for storyline development, scene composition, cinematography design, and character interaction coordination, leading to high production costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Our framework represents a significant step toward fully automated movie production, bridging the gap between AI-driven video generation and high-quality, narrative-consistent filmmaking.
- Clone the repository.
git clone
cd MovieAgent
- Install the environment.
conda create -n MovieAgent python=3.8
conda activate MovieAgent
pip install -r requirements.txt
First, you need to prepare the script synopsis of movie
and photo,audio of character
as follow:
dataset/
movie name/
script_synopsis.json
character_list/
character 1/
photo_1.jpg
photo_2.jpg
audio.wav
character 2/
...
Then, you need to configure the open_api_key
and model name
in movie_agent/script/run.sh
.
The reasoning process using agents may involve various image and video generation models. Most models can be automatically downloaded, while a few require manual configuration, such as StoryDiffusion and ROICtrl for character customization:
LLM (Language Model) | Image Gen. (Image Generation) | Video Gen. (Video Generation) |
---|---|---|
GPT4-o | ROICtrl | SVD/HunyuanVideo_I2V |
You can train by yourself following the Guidance.
Or download our weight directly:
# download the ED lora for movie: Frozen II
cd movie_agent/weight
git lfs install
git clone https://huggingface.co/weijiawu/MovieAgent-ROICtrl-Frozen
# download the ROICtrl adapter
wget -P ROICtrl_sdv14_30K https://huggingface.co/guyuchao/ROICtrl/resolve/main/ROICtrl_sdv14_30K.safetensors
ROICtrl requires an environment with CUDA 12.1 and PyTorch 2.4.
Step 1: Download HunyuanVideo-I2V model
python -m pip install "huggingface_hub[cli]"
# Use the huggingface-cli tool to download HunyuanVideo-I2V model in HunyuanVideo-I2V/ckpts dir.
cd movie_agent/weight
mkdir HunyuanVideo_I2V
huggingface-cli download tencent/HunyuanVideo-I2V --local-dir ./HunyuanVideo_I2V
# Download Text Encoder.
cd HunyuanVideo_I2V
huggingface-cli download xtuner/llava-llama-3-8b-v1_1-transformers --local-dir ./text_encoder_i2v
huggingface-cli download openai/clip-vit-large-patch14 --local-dir ./text_encoder_2
Gnerate the long video with MovieDirector:
cd movie_agent
sh script/run.sh
If you find our repo useful for your research, please consider citing our paper:
@misc{wu2025movieagent,
title={Automated Movie Generation via Multi-Agent CoT Planning},
author={Weijia Wu, Zeyu Zhu, Mike Zheng Shou},
year={2025},
eprint={2503.07314},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Thanks to Diffusers for the wonderful work and codebase.
- Thanks to Evaluation-Agent for the wonderful work and codebase.