productionizing-diffusion

This repository contains the source code for the blog series "Optimizing Diffusion Inference for Production-Ready Speeds". The blog series walks through simple inference optimization on FLUX.1 and Wan T2V.

We will cover the following topics:

How text-to-image diffusion models work and their computational challenges?
Standard optimizations for transformer-based diffusion models
Going deep: using faster kernels, non-trivial fusions, precomputations
Context parallelism
Quantization
Caching
LoRA
Training
Practice: Wan text-to-video
Optimizing inference for uncommon deployment environments using Triton

Post	Topics covered
Optimizing diffusion inference for production-ready speeds - I	1, 2
Optimizing diffusion inference for production-ready speeds - II	3, 4
Optimizing diffusion inference for production-ready speeds - III	5, 6
Optimizing diffusion inference for production-ready speeds - IV	7, 8, 9, 10

installation

git clone https://github.com/a-r-r-o-w/productionizing-diffusion
cd productionizing-diffusion/

uv venv venv
source venv/bin/activate

uv pip install torch==2.6 torchvision --index-url https://download.pytorch.org/whl/cu124 --verbose
uv pip install -r requirements.txt

# Make sure to have CUDA 12.4 or 12.8 (this is the only version I've tested, so you
# might have to do things differently for other versions when setting up FA2)
# https://developer.nvidia.com/cuda-12-4-0-download-archive

# Flash Attention 2 (optional, FA3 is recommended and much faster for H100, while Pytorch's cuDNN backend is
# good for both A100 and H100)
# For Python 3.10, use pre-built wheel or build from source
MAX_JOBS=4 uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation --verbose

# Flash Attention 3
# Make sure you have atleast 64 GB CPU RAM when building from source otherwise
# the installation will crash
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
# We install v2.7.4.post1 because the latest release (2.8.x) might cause
# some installation issues which are hard to debug
# Update: 2.8.3 seems to install without any problems on CUDA 12.8 and Pytorch 2.10 nightly.
git checkout v2.7.4.post1
python setup.py install

references

@article{fang2024xdit,
  title={xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism},
  author={Fang, Jiarui and Pan, Jinzhe and Sun, Xibo and Li, Aoyu and Wang, Jiannan},
  journal={arXiv preprint arXiv:2411.01738},
  year={2024}
}

@misc{paraattention-2025,
  author = {Cheng Zeyi},
  title = {ParaAttention: Context Parallel Attention for Diffusion Transformers},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/chengzeyi/ParaAttention}}
}

citation

If you use this project, please cite it as:

@misc{avs2025optdiff,
  author = {Aryan V S},
  title = {Optimizing Diffusion Inference for Production-Ready Speeds},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/a-r-r-o-w/productionizing-diffusion}}
  url = {https://a-r-r-o-w.github.io/blog/3_blossom/00001_productionizing_diffusion-1/index.html}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
additional		additional
assets		assets
post_1		post_1
post_2		post_2
post_3		post_3
post_4		post_4
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.sh		env.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

productionizing-diffusion

installation

references

citation

About

Uh oh!

Releases

Packages

Languages

License

a-r-r-o-w/productionizing-diffusion

Folders and files

Latest commit

History

Repository files navigation

productionizing-diffusion

installation

references

citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages