Skip to content

Commit 46a9db0

Browse files
[Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (huggingface#8239)
* code and doc * update paper link * remove redundant codes * add example video --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent 370146e commit 46a9db0

File tree

2 files changed

+2599
-0
lines changed

2 files changed

+2599
-0
lines changed

examples/community/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
6969
| UFOGen Scheduler | Scheduler for UFOGen Model (compatible with Stable Diffusion pipelines) | [UFOGen Scheduler](#ufogen-scheduler) | - | [dg845](https://github.com/dg845) |
7070
| Stable Diffusion XL IPEX Pipeline | Accelerate Stable Diffusion XL inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion XL on IPEX](#stable-diffusion-xl-on-ipex) | - | [Dan Li](https://github.com/ustcuna/) |
7171
| Stable Diffusion BoxDiff Pipeline | Training-free controlled generation with bounding boxes using [BoxDiff](https://github.com/showlab/BoxDiff) | [Stable Diffusion BoxDiff Pipeline](#stable-diffusion-boxdiff) | - | [Jingyang Zhang](https://github.com/zjysteven/) |
72+
| FRESCO V2V Pipeline | Implementation of [[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](https://arxiv.org/abs/2403.12962) | [FRESCO V2V Pipeline](#fresco) | - | [Yifan Zhou](https://github.com/SingleZombie) |
7273

7374
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
7475

@@ -4035,6 +4036,93 @@ onestep_image = pipe(prompt, num_inference_steps=1).images[0]
40354036
multistep_image = pipe(prompt, num_inference_steps=4).images[0]
40364037
```
40374038

4039+
### FRESCO
4040+
4041+
This is the Diffusers implementation of zero-shot video-to-video translation pipeline [FRESCO](https://github.com/williamyang1991/FRESCO) (without Ebsynth postprocessing and background smooth). To run the code, please install gmflow. Then modify the path in `gmflow_dir`. After that, you can run the pipeline with:
4042+
4043+
```py
4044+
from PIL import Image
4045+
import cv2
4046+
import torch
4047+
import numpy as np
4048+
4049+
from diffusers import ControlNetModel,DDIMScheduler, DiffusionPipeline
4050+
import sys
4051+
gmflow_dir = "/path/to/gmflow"
4052+
sys.path.insert(0, gmflow_dir)
4053+
4054+
def video_to_frame(video_path: str, interval: int):
4055+
vidcap = cv2.VideoCapture(video_path)
4056+
success = True
4057+
4058+
count = 0
4059+
res = []
4060+
while success:
4061+
count += 1
4062+
success, image = vidcap.read()
4063+
if count % interval != 1:
4064+
continue
4065+
if image is not None:
4066+
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
4067+
res.append(image)
4068+
if len(res) >= 8:
4069+
break
4070+
4071+
vidcap.release()
4072+
return res
4073+
4074+
4075+
input_video_path = 'https://github.com/williamyang1991/FRESCO/raw/main/data/car-turn.mp4'
4076+
output_video_path = 'car.gif'
4077+
4078+
# You can use any fintuned SD here
4079+
model_path = 'SG161222/Realistic_Vision_V2.0'
4080+
4081+
prompt = 'a red car turns in the winter'
4082+
a_prompt = ', RAW photo, subject, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3, '
4083+
n_prompt = '(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation'
4084+
4085+
input_interval = 5
4086+
frames = video_to_frame(
4087+
input_video_path, input_interval)
4088+
4089+
control_frames = []
4090+
# get canny image
4091+
for frame in frames:
4092+
image = cv2.Canny(frame, 50, 100)
4093+
np_image = np.array(image)
4094+
np_image = np_image[:, :, None]
4095+
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
4096+
canny_image = Image.fromarray(np_image)
4097+
control_frames.append(canny_image)
4098+
4099+
# You can use any ControlNet here
4100+
controlnet = ControlNetModel.from_pretrained(
4101+
"lllyasviel/sd-controlnet-canny").to('cuda')
4102+
4103+
pipe = DiffusionPipeline.from_pretrained(
4104+
model_path, controlnet=controlnet, custom_pipeline='fresco_v2v').to('cuda')
4105+
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
4106+
4107+
generator = torch.manual_seed(0)
4108+
frames = [Image.fromarray(frame) for frame in frames]
4109+
4110+
output_frames = pipe(
4111+
prompt + a_prompt,
4112+
frames,
4113+
control_frames,
4114+
num_inference_steps=20,
4115+
strength=0.75,
4116+
controlnet_conditioning_scale=0.7,
4117+
generator=generator,
4118+
negative_prompt=n_prompt
4119+
).images
4120+
4121+
output_frames[0].save(output_video_path, save_all=True,
4122+
append_images=output_frames[1:], duration=100, loop=0)
4123+
4124+
```
4125+
40384126
# Perturbed-Attention Guidance
40394127

40404128
[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)

0 commit comments

Comments
 (0)