Skip to content

Commit 98730c5

Browse files
authored
* Fix typos * Trim trailing whitespaces * Remove a trailing whitespace * chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0 * Revert "chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0" This reverts commit fd742b3. * pokemon -> naruto * `DPMSolverMultistep` -> `DPMSolverMultistepScheduler` * Improve Markdown stylization * Improve style * Improve style * Refactor pipeline variable names for consistency * up style
1 parent 7ebd359 commit 98730c5

File tree

35 files changed

+186
-191
lines changed

35 files changed

+186
-191
lines changed

docs/source/en/api/pipelines/amused.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface
1616

1717
Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
1818

19-
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
19+
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
2020

2121
The abstract from the paper is:
2222

docs/source/en/api/pipelines/kandinsky3.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ specific language governing permissions and limitations under the License.
1111

1212
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
1313

14-
The description from it's Github page:
14+
The description from it's Github page:
1515

1616
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
1717

1818
Its architecture includes 3 main components:
19-
1. [FLAN-UL2](https://huggingface.co/google/flan-ul2), which is an encoder decoder model based on the T5 architecture.
19+
1. [FLAN-UL2](https://huggingface.co/google/flan-ul2), which is an encoder decoder model based on the T5 architecture.
2020
2. New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters.
2121
3. Sber-MoVQGAN is a decoder proven to have superior results in image restoration.
2222

docs/source/en/api/pipelines/ledits_pp.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ You can find additional information about LEDITS++ on the [project page](https:/
2525
</Tip>
2626

2727
<Tip warning={true}>
28-
Due to some backward compatability issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion.
29-
This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp).
28+
Due to some backward compatability issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion.
29+
This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp).
3030
</Tip>
3131

32-
We provide two distinct pipelines based on different pre-trained models.
32+
We provide two distinct pipelines based on different pre-trained models.
3333

3434
## LEditsPPPipelineStableDiffusion
3535
[[autodoc]] pipelines.ledits_pp.LEditsPPPipelineStableDiffusion

docs/source/en/api/pipelines/marigold.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ specific language governing permissions and limitations under the License.
1414

1515
![marigold](https://marigoldmonodepth.github.io/images/teaser_collage_compressed.jpg)
1616

17-
Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
18-
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
19-
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
20-
Later,
17+
Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
18+
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
19+
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
20+
Later,
2121
- [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference;
2222
- [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation;
2323
- [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)).
@@ -28,7 +28,7 @@ The abstract from the paper is:
2828

2929
## Available Pipelines
3030

31-
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
31+
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
3232
Currently, the following tasks are implemented:
3333

3434
| Pipeline | Predicted Modalities | Demos |
@@ -39,7 +39,7 @@ Currently, the following tasks are implemented:
3939

4040
## Available Checkpoints
4141

42-
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.
42+
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.
4343

4444
<Tip>
4545

@@ -49,11 +49,11 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
4949

5050
<Tip warning={true}>
5151

52-
Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
52+
Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
5353
Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers.
54-
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
55-
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
56-
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.
54+
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
55+
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
56+
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.
5757

5858
</Tip>
5959

docs/source/en/api/pipelines/pixart.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
3737

3838
## Inference with under 8GB GPU VRAM
3939

40-
Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
40+
Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
4141

4242
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
4343

@@ -75,10 +75,10 @@ with torch.no_grad():
7575
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
7676
```
7777

78-
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM:
78+
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
7979

8080
```python
81-
import gc
81+
import gc
8282

8383
def flush():
8484
gc.collect()
@@ -99,7 +99,7 @@ pipe = PixArtAlphaPipeline.from_pretrained(
9999
).to("cuda")
100100

101101
latents = pipe(
102-
negative_prompt=None,
102+
negative_prompt=None,
103103
prompt_embeds=prompt_embeds,
104104
negative_prompt_embeds=negative_embeds,
105105
prompt_attention_mask=prompt_attention_mask,
@@ -146,4 +146,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al
146146
[[autodoc]] PixArtAlphaPipeline
147147
- all
148148
- __call__
149-

docs/source/en/api/pipelines/pixart_sigma.md

+4-6
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
3939

4040
## Inference with under 8GB GPU VRAM
4141

42-
Run the [`PixArtSigmaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
42+
Run the [`PixArtSigmaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
4343

4444
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
4545

@@ -59,7 +59,6 @@ text_encoder = T5EncoderModel.from_pretrained(
5959
subfolder="text_encoder",
6060
load_in_8bit=True,
6161
device_map="auto",
62-
6362
)
6463
pipe = PixArtSigmaPipeline.from_pretrained(
6564
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
@@ -77,10 +76,10 @@ with torch.no_grad():
7776
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
7877
```
7978

80-
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM:
79+
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
8180

8281
```python
83-
import gc
82+
import gc
8483

8584
def flush():
8685
gc.collect()
@@ -101,7 +100,7 @@ pipe = PixArtSigmaPipeline.from_pretrained(
101100
).to("cuda")
102101

103102
latents = pipe(
104-
negative_prompt=None,
103+
negative_prompt=None,
105104
prompt_embeds=prompt_embeds,
106105
negative_prompt_embeds=negative_embeds,
107106
prompt_attention_mask=prompt_attention_mask,
@@ -148,4 +147,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al
148147
[[autodoc]] PixArtSigmaPipeline
149148
- all
150149
- __call__
151-

docs/source/en/api/pipelines/stable_diffusion/overview.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ inpaint = StableDiffusionInpaintPipeline(**text2img.components)
177177

178178
The Stable Diffusion pipelines are automatically supported in [Gradio](https://github.com/gradio-app/gradio/), a library that makes creating beautiful and user-friendly machine learning apps on the web a breeze. First, make sure you have Gradio installed:
179179

180-
```
180+
```sh
181181
pip install -U gradio
182182
```
183183

@@ -209,4 +209,4 @@ gr.Interface.from_pipeline(pipe).launch()
209209
```
210210

211211
By default, the web demo runs on a local server. If you'd like to share it with others, you can generate a temporary public
212-
link by setting `share=True` in `launch()`. Or, you can host your demo on [Hugging Face Spaces](https://huggingface.co/spaces)https://huggingface.co/spaces for a permanent link.
212+
link by setting `share=True` in `launch()`. Or, you can host your demo on [Hugging Face Spaces](https://huggingface.co/spaces)https://huggingface.co/spaces for a permanent link.

docs/source/en/api/schedulers/edm_multistep_dpm_solver.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# EDMDPMSolverMultistepScheduler
1414

15-
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistep`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
15+
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistepScheduler`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
1616

1717
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
1818
samples, and it can generate quite good samples even in 10 steps.

docs/source/en/api/schedulers/multistep_dpm_solver.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# DPMSolverMultistepScheduler
1414

15-
`DPMSolverMultistep` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
15+
`DPMSolverMultistepScheduler` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
1616

1717
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
1818
samples, and it can generate quite good samples even in 10 steps.

docs/source/en/optimization/deepcache.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Then load and enable the [`DeepCacheSDHelper`](https://github.com/horseee/DeepCa
3636
image = pipe("a photo of an astronaut on a moon").images[0]
3737
```
3838

39-
The `set_params` method accepts two arguments: `cache_interval` and `cache_branch_id`. `cache_interval` means the frequency of feature caching, specified as the number of steps between each cache operation. `cache_branch_id` identifies which branch of the network (ordered from the shallowest to the deepest layer) is responsible for executing the caching processes.
39+
The `set_params` method accepts two arguments: `cache_interval` and `cache_branch_id`. `cache_interval` means the frequency of feature caching, specified as the number of steps between each cache operation. `cache_branch_id` identifies which branch of the network (ordered from the shallowest to the deepest layer) is responsible for executing the caching processes.
4040
Opting for a lower `cache_branch_id` or a larger `cache_interval` can lead to faster inference speed at the expense of reduced image quality (ablation experiments of these two hyperparameters can be found in the [paper](https://arxiv.org/abs/2312.00858)). Once those arguments are set, use the `enable` or `disable` methods to activate or deactivate the `DeepCacheSDHelper`.
4141

4242
<div class="flex justify-center">

docs/source/en/using-diffusers/callback.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ def latents_to_rgb(latents):
188188
```py
189189
def decode_tensors(pipe, step, timestep, callback_kwargs):
190190
latents = callback_kwargs["latents"]
191-
191+
192192
image = latents_to_rgb(latents)
193193
image.save(f"{step}.png")
194194

docs/source/en/using-diffusers/marigold_usage.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -138,15 +138,15 @@ Because Marigold's latent space is compatible with the base Stable Diffusion, it
138138
```diff
139139
import diffusers
140140
import torch
141-
141+
142142
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
143143
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
144144
).to("cuda")
145-
145+
146146
+ pipe.vae = diffusers.AutoencoderTiny.from_pretrained(
147147
+ "madebyollin/taesd", torch_dtype=torch.float16
148148
+ ).cuda()
149-
149+
150150
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
151151
depth = pipe(image)
152152
```
@@ -156,13 +156,13 @@ As suggested in [Optimizations](../optimization/torch2.0#torch.compile), adding
156156
```diff
157157
import diffusers
158158
import torch
159-
159+
160160
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
161161
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
162162
).to("cuda")
163-
163+
164164
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
165-
165+
166166
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
167167
depth = pipe(image)
168168
```
@@ -208,7 +208,7 @@ model_paper_kwargs = {
208208
diffusers.schedulers.LCMScheduler: {
209209
"num_inference_steps": 4,
210210
"ensemble_size": 5,
211-
},
211+
},
212212
}
213213

214214
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
@@ -261,7 +261,7 @@ model_paper_kwargs = {
261261
diffusers.schedulers.LCMScheduler: {
262262
"num_inference_steps": 4,
263263
"ensemble_size": 10,
264-
},
264+
},
265265
}
266266

267267
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
@@ -415,18 +415,18 @@ image = diffusers.utils.load_image(
415415

416416
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
417417
"prs-eth/marigold-depth-lcm-v1-0", torch_dtype=torch.float16, variant="fp16"
418-
).to("cuda")
418+
).to(device)
419419

420420
depth_image = pipe(image, generator=generator).prediction
421421
depth_image = pipe.image_processor.visualize_depth(depth_image, color_map="binary")
422422
depth_image[0].save("motorcycle_controlnet_depth.png")
423423

424424
controlnet = diffusers.ControlNetModel.from_pretrained(
425425
"diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, variant="fp16"
426-
).to("cuda")
426+
).to(device)
427427
pipe = diffusers.StableDiffusionXLControlNetPipeline.from_pretrained(
428428
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet
429-
).to("cuda")
429+
).to(device)
430430
pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
431431

432432
controlnet_out = pipe(

docs/source/en/using-diffusers/scheduler_features.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113,
134134
prompt = "anthropomorphic capybara wearing a suit and working with a computer"
135135
generator = torch.Generator(device='cuda').manual_seed(123)
136136
image = pipeline(
137-
prompt=prompt,
137+
prompt=prompt,
138138
num_inference_steps=10,
139139
sigmas=sigmas,
140140
generator=generator

0 commit comments

Comments
 (0)