Add Z-Image Text-to-Image Generation Support #3261
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces support for Z-Image, Alibaba's ~24B parameter text-to-image generation model using Flow Matching. The implementation follows Candle's architecture conventions and includes the full inference pipeline.
Model Overview
Z-Image is a state-of-the-art text-to-image model featuring:
Model Links:
🔧 Usage Examples
Basic Usage (CUDA)
cargo run --features cuda --example z_image --release -- \ --model-path weights/Z-Image-Turbo \ --prompt "A beautiful landscape with mountains and a lake" \ --width 1024 --height 768 \ --num-steps 8Using Metal (macOS)
cargo run --features metal --example z_image --release -- \ --model-path weights/Z-Image-Turbo \ --prompt "A futuristic city at night with neon lights" \ --width 1024 --height 1024 \ --num-steps 9Files Changed
New Files
candle-transformers/src/models/z_image/mod.rscandle-transformers/src/models/z_image/transformer.rscandle-transformers/src/models/z_image/text_encoder.rscandle-transformers/src/models/z_image/vae.rscandle-transformers/src/models/z_image/scheduler.rscandle-transformers/src/models/z_image/sampling.rscandle-transformers/src/models/z_image/preprocess.rscandle-examples/examples/z_image/main.rscandle-examples/examples/z_image/README.mdModified Files
candle-transformers/src/models/mod.rspub mod z_image;Implementation Highlights
1. Optimized Patchify/Unpatchify
The implementation uses optimized 6D tensor operations for the F=1 (single frame) case, avoiding Candle's 7D+ dimension limitations:
2. 3D RoPE Position Encoding
Implements 3D Rotary Position Embeddings with pre-computed sin/cos caches:
3. AdaLN Modulation with Tanh Gate
4. Dynamic Timestep Shifting
Image Size Requirements
Image dimensions must be divisible by 16:
Latent size formula:
latent = 2 × (image_size ÷ 16)📝 Testing Status
cargo check --features metalcargo clippy --workspace --tests --examples --benches -- -D warningscargo fmt --all -- --checkSample Output
Metal
Cuda
Checklist
cargo clippy --workspace --tests --examples --benches -- -D warningscargo fmt --all -- --checkReferences
Z-Image
Diffusers
Additional Fix: Clippy Warning in
candle-nnWhile implementing SDPA support for Z-Image, I discovered a minor clippy warning in
candle-nn/src/ops.rs:1040introduced by PR #3196. @EricLBuehlerIssue:
clippy::nonminimal_boolwarning