🚀 Training and Inference

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model

Mingtao Guo¹^† Guanyu Xing²^‡ Yanli Liu^1,3^▽

¹ National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China

² School of Cyber Science and Engineering, Sichuan University, Chengdu, China

³ College of Computer Science, Sichuan University, Chengdu, China

^† First author ^‡ Second author ^▽ Corresponding author

To Appear at CVPR 2025

📄 arXiv Paper 🌐 Project Page 📺 Video

🔥 News

[2025.03.02] Our pre-trained model is out on HuggingFace!
[2025.02.27] ⭐ Exciting News! Relightable-Portrait-Animation got accepted by CVPR 2025!

📑 Todos

We are going to make all the following contents available:

Model inference code
Model checkpoint
Training code
Data processing code

Installation

Clone this repo locally:

git clone https://github.com/MingtaoGuo/Relightable-Portrait-Animation
cd Relightable-Portrait-Animation

Install the dependencies:

conda create -n relipa python=3.8
conda activate relipa

Install packages for inference:

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.7+pt2.2.2cu121

pip install -r requirements.txt

Download weights

mkdir pretrained_weights
mkdir pretrained_weights/relipa
git-lfs install

git clone https://huggingface.co/MartinGuo/Relightable-Portrait-Animation
mv Relightable-Portrait-Animation/ref_embedder.pth pretrained_weights/relipa
mv Relightable-Portrait-Animation/light_embedder.pth pretrained_weights/relipa
mv Relightable-Portrait-Animation/head_embedder.pth pretrained_weights/relipa
mv Relightable-Portrait-Animation/unet.pth pretrained_weights/relipa

mv Relightable-Portrait-Animation/data src/decalib
mv Relightable-Portrait-Animation/u2net_human_seg.pth src/facematting

git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
mv stable-video-diffusion-img2vid-xt pretrained_weights

git clone https://huggingface.co/stabilityai/sd-vae-ft-mse
mv sd-vae-ft-mse pretrained_weights/stable-video-diffusion-img2vid-xt

The weights will be put in the ./pretrained_weights directory. Heads up! The whole downloading process could take quite a long time. Finally, these weights should be orgnized as follows:

./pretrained_weights/
|-- relipa
|   |-- unet.pth
|   |-- ref_embedder.pth
|   |-- light_embedder.pth
|   |-- head_embedder.pth
|-- stable-video-diffusion-img2vid-xt
    |-- sd-vae-ft-mse
    |   |-- config.json
    |   |-- diffusion_pytorch_model.bin
    |-- feature_extractor
    |   |-- preprocessor_config.json
    |-- scheduler
    |   |-- scheduler_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   |-- diffusion_pytorch_model.safetensors
    |   |-- diffusion_pytorch_model.fp16.safetensors
    |-- image_encoder
    |   |-- config.json
    |   |-- model.safetensors
    |   |-- model.fp16.safetensors

🚀 Training and Inference

Inference of the Relightable Portrait Animation

Here's the command to run preprocess scripts: Use DECA to extract the pose from the driving video and the mesh from the reference portrait, then render shading hints by combining the driving video's pose, the reference portrait's mesh, and the target lighting.

python preprocess.py --video_path resources/WDA_DebbieDingell1_000.mp4 --source_path resources/reference.png --light_path resources/target_lighting1.png --save_path resources/shading.mp4 --motion_align relative

After running preprocess.py you'll get the results:

Reference, 2. Mask, 3. Driving image, 4. Landmark, 5. Shading hints

Here's the command to run inference scripts: Guide our model with the shading hints obtained from preprocessing to generate results where the pose is consistent with that of the driving video, the identity is consistent with the reference image, and the lighting is consistent with the target lighting.

python inference.py --pretrained_model_name_or_path pretrained_weights/stable-video-diffusion-img2vid-xt --checkpoint_path pretrained_weights/relipa/ --video_path resources/shading.mp4 --save_path result.mp4 --guidance 4.5 --inference_steps 25 --driving_mode relighting

After running inference.py you'll get the results:

Reference, 2. Shading hints, 3. Relighting result, 4. Driving image

Training of the Relightable Portrait Animation

python train.py --pretrained_model_name_or_path pretrained_weights/stable-video-diffusion-img2vid-xt \
                --height 512 --width 512  --num_frames 16 --validation_steps 100 --max_train_steps 30000 \
                --gradient_accumulation_steps 8 --gradient_checkpointing True --learning_rate 1e-5 --use_8bit_adam True \
                --sample_rate 4 --num_workers 2 --checkpointing_steps 1000 --checkpoints_total_limit 2 \
                --data_meta_path TalkingHeadVideo/VFHQ/VFHQ-data-consistent.json

Talking Head Video Dataset

VFHQ-video	VFHQ-kpmap	VFHQ-mesh	VFHQ-mask

Training dataset

./TalkingHeadVideo/
    |-- VFHQ
        |-- VFHQ-mask
            |-- Clip+zZEv-ATOpoY+P0+C2+F3168-3532_10369.mp4
             ...
        |-- VFHQ-kpmap
        |-- VFHQ-video
        |-- VFHQ-mesh
        VFHQ-data-consistent.json
    |-- CelebV-HQ
        |-- CelebV-HQ-mask
            |-- __lRwnjxeCg_1.mp4
             ...
        |-- CelebV-HQ-kpmap
        |-- CelebV-HQ-video
        |-- CelebV-HQ-mesh
        CelebV-HQ-data-consistent.json

Acknowledgements

We first thank to the contributors to the StableVideoDiffusion, Echomimic and MimicMotion repositories, for their open research and exploration. Furthermore, our repo incorporates some codes from DECA, MediaPipe and U2Net, and we extend our thanks to them as well.

Citation

If you use this model in your research, please consider citing:

@misc{guo2025highfidelityrelightablemonocularportrait,
      title={High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model}, 
      author={Mingtao Guo and Guanyu Xing and Yanli Liu},
      year={2025},
      eprint={2502.19894},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.19894}, 
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name	Name	Last commit message	Last commit date
Latest commit MingtaoGuo Update README.md Mar 9, 2025 4724b60 · Mar 9, 2025 History 59 Commits
assets	assets	Add files via upload	Mar 9, 2025
resources	resources	Add files via upload	Mar 2, 2025
src	src	Add files via upload	Mar 2, 2025
Dataset.py	Dataset.py	Add files via upload	Mar 9, 2025
LICENSE	LICENSE	Initial commit	Feb 28, 2025
README.md	README.md	Update README.md	Mar 9, 2025
inference.py	inference.py	Update inference.py	Mar 3, 2025
preprocess.py	preprocess.py	Add files via upload	Mar 2, 2025
relipa.yaml	relipa.yaml	Update relipa.yaml	Mar 3, 2025
requirements.txt	requirements.txt	Update requirements.txt	Mar 9, 2025
train.py	train.py	Add files via upload	Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model

To Appear at CVPR 2025

📄 arXiv Paper 🌐 Project Page 📺 Video

🔥 News

📑 Todos

Installation

Download weights

🚀 Training and Inference

Inference of the Relightable Portrait Animation

Training of the Relightable Portrait Animation

Talking Head Video Dataset

Acknowledgements

Citation

License

About

Releases

Packages

Languages

License

MingtaoGuo/Relightable-Portrait-Animation

Folders and files

Latest commit

History

Repository files navigation

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model

To Appear at CVPR 2025

📄 arXiv Paper 🌐 Project Page 📺 Video

🔥 News

📑 Todos

Installation

Download weights

🚀 Training and Inference

Inference of the Relightable Portrait Animation

Training of the Relightable Portrait Animation

Talking Head Video Dataset

Acknowledgements

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages