This repository is a reference implementation for ViVid-1-to-3. It combines video diffusion with novel-view synthesis diffusion models for increased pose and appearace consistency.
pip install torch "diffusers==0.24" transformers accelerate einops kornia imageio[ffmpeg] opencv-python pydantic scikit-image lpipsPut the reference image to $IMAGE_PATH, and set the input_image_path in scripts/task_example.yaml to it. Then run
python run_generation.py --task_yaml_path=scripts/task_example.yamlWe have supported running batch generation tasks on both PC and SLURM clusters.
We tested our method on 100 GSO objects. The list of the objects is in scripts/gso_metadata_object_prompt_100.csv, along with our labeled text prompts if you would like to test prompt-based generation yourself. We have rendered the 100 objects beforehand. It can be downloaded here. You can decompress the content into gso-100. Then simply run the following line to prepare a batch generation job on a PC:
python -m scripts.job_config_yaml_generation Or run the following line to prepare a batch generation job on a SLURM cluster, which will move temporary stuff to $SLURM_TMPDIR of your cluster:
python -m scripts.job_config_yaml_generation --run_on_slurm
All the yaml files will be generated in a new folder called tasks_gso.
If you want to run customized batch generation, simply add an entry in the job_specs list in the beginning of scripts/job_config_yaml_generation.py and run it with the same bash command. An example has been commented out in it.
For batch generation, run
python run_batch_generation.py --task_yamls_dir=tasks_gso --dataset_dir=gso-100 --output_dir=outputs --obj_csv_file=scripts/gso_metadata_object_prompt_100.csvIt takes about 1min30s to run one generation on a v100 gpu. If the number of generations is too large for each job you can schedule on a SLURM cluster,
you can split the dataset for each job using the --run_from_obj_index and --run_to_obj_index options. For example
python run_batch_generation.py --task_yamls_dir=tasks_gso --dataset_dir=gso-100 --output_dir=outputs --obj_csv_file=scripts/gso_metadata_object_prompt_100.csv --run_from_obj_index=0 --run_to_obj_index=50To run evaluation for a batch generation, put the experiments you want to evaluate in the eval_specs list in run_evaluation.py. Make sure the exp_name key has the same value as that of your batch generation. Also, you should modify the expdir and savedir in run_evaluation.py. Suppose you want to run the $EXP_ID-th experiment in the list, then do the following:
python run_evaluation.py --exp_id $EXP_IDAfter the evaluation is run, intermediate results on PSNR, SSIM, LPIPS, FOR_8, FOR_16 for each object will be put to savedir.
Finally, you can use run_calculate_stats.py to get the PSNR, SSIM, LPIPS, FOR_8, FOR_16 stats for this experiment on your whole dataset. Make sure to modify the psnr_save_dir, lpips_save_dir, ssim_save_dir, for_8_save_dir, for_16_save_dir in run_calculate_stats.py to match the folder storing the intermediate results from the last step.
python run_calculate_stats.pyThis repo is based on the Huggingface community implementation and converted weights of Zero-1-to-3, as well as the Huggingface community text-to-video model Zeroscope v2. Thanks for their awesome works.
If you use this code in your research, please cite our paper:
@inproceedings{kwak2024vivid,
title={Vivid-1-to-3: Novel view synthesis with video diffusion models},
author={Kwak, Jeong-gi and Dong, Erqun and Jin, Yuhe and Ko, Hanseok and Mahajan, Shweta and Yi, Kwang Moo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6775--6785},
year={2024}
}