Skip to content

Commit 3a7e481

Browse files
authored
[docs] Video generation (huggingface#6701)
* first draft * fix path * fix path * i2vgen-xl * review * modelscopet2v * feedback
1 parent d649d6c commit 3a7e481

File tree

5 files changed

+569
-16
lines changed

5 files changed

+569
-16
lines changed

Diff for: docs/source/en/_toctree.yml

+4
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@
5252
title: Image-to-image
5353
- local: using-diffusers/inpaint
5454
title: Inpainting
55+
- local: using-diffusers/text-img2vid
56+
title: Text or image-to-video
5557
- local: using-diffusers/depth2img
5658
title: Depth-to-image
5759
title: Tasks
@@ -323,6 +325,8 @@
323325
title: Text-to-image
324326
- local: api/pipelines/stable_diffusion/img2img
325327
title: Image-to-image
328+
- local: api/pipelines/stable_diffusion/svd
329+
title: Image-to-video
326330
- local: api/pipelines/stable_diffusion/inpaint
327331
title: Inpainting
328332
- local: api/pipelines/stable_diffusion/depth2img

Diff for: docs/source/en/api/attnprocessor.md

+19-16
Original file line numberDiff line numberDiff line change
@@ -20,41 +20,44 @@ An attention processor is a class for applying different types of attention mech
2020
## AttnProcessor2_0
2121
[[autodoc]] models.attention_processor.AttnProcessor2_0
2222

23-
## FusedAttnProcessor2_0
24-
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
23+
## AttnAddedKVProcessor
24+
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
2525

26-
## LoRAAttnProcessor
27-
[[autodoc]] models.attention_processor.LoRAAttnProcessor
26+
## AttnAddedKVProcessor2_0
27+
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
2828

29-
## LoRAAttnProcessor2_0
30-
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
29+
## CrossFrameAttnProcessor
30+
[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
3131

3232
## CustomDiffusionAttnProcessor
3333
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
3434

3535
## CustomDiffusionAttnProcessor2_0
3636
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
3737

38-
## AttnAddedKVProcessor
39-
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
38+
## CustomDiffusionXFormersAttnProcessor
39+
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
4040

41-
## AttnAddedKVProcessor2_0
42-
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
41+
## FusedAttnProcessor2_0
42+
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
43+
44+
## LoRAAttnProcessor
45+
[[autodoc]] models.attention_processor.LoRAAttnProcessor
46+
47+
## LoRAAttnProcessor2_0
48+
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
4349

4450
## LoRAAttnAddedKVProcessor
4551
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
4652

47-
## XFormersAttnProcessor
48-
[[autodoc]] models.attention_processor.XFormersAttnProcessor
49-
5053
## LoRAXFormersAttnProcessor
5154
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
5255

53-
## CustomDiffusionXFormersAttnProcessor
54-
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
55-
5656
## SlicedAttnProcessor
5757
[[autodoc]] models.attention_processor.SlicedAttnProcessor
5858

5959
## SlicedAttnAddedKVProcessor
6060
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
61+
62+
## XFormersAttnProcessor
63+
[[autodoc]] models.attention_processor.XFormersAttnProcessor

Diff for: docs/source/en/api/pipelines/stable_diffusion/svd.md

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Stable Video Diffusion
14+
15+
Stable Video Diffusion was proposed in [Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets](https://hf.co/papers/2311.15127) by Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach.
16+
17+
The abstract from the paper is:
18+
19+
*We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning. Furthermore, we demonstrate the necessity of a well-curated pretraining dataset for generating high-quality videos and present a systematic curation process to train a strong base model, including captioning and filtering strategies. We then explore the impact of finetuning our base model on high-quality data and train a text-to-video model that is competitive with closed-source video generation. We also show that our base model provides a powerful motion representation for downstream tasks such as image-to-video generation and adaptability to camera motion-specific LoRA modules. Finally, we demonstrate that our model provides a strong multi-view 3D-prior and can serve as a base to finetune a multi-view diffusion model that jointly generates multiple views of objects in a feedforward fashion, outperforming image-based methods at a fraction of their compute budget. We release code and model weights at this https URL.*
20+
21+
<Tip>
22+
23+
To learn how to use Stable Video Diffusion, take a look at the [Stable Video Diffusion](../../../using-diffusers/svd) guide.
24+
25+
<br>
26+
27+
Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the [base](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [extended frame](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) checkpoints!
28+
29+
</Tip>
30+
31+
## Tips
32+
33+
Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient.
34+
35+
Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.
36+
37+
## StableVideoDiffusionPipeline
38+
39+
[[autodoc]] StableVideoDiffusionPipeline
40+
41+
## StableVideoDiffusionPipelineOutput
42+
43+
[[autodoc]] pipelines.stable_video_diffusion.StableVideoDiffusionPipelineOutput

Diff for: docs/source/en/api/pipelines/text_to_video.md

+6
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,12 @@ Here are some sample outputs:
167167
</tr>
168168
</table>
169169

170+
## Tips
171+
172+
Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient.
173+
174+
Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.
175+
170176
<Tip>
171177

172178
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.

0 commit comments

Comments
 (0)