Skip to content

[ROCm] better AMD CDNA4 and RDNA4 support for VAE#13411

Open
Apophis3158 wants to merge 1 commit intoComfy-Org:masterfrom
Apophis3158:master/rocm
Open

[ROCm] better AMD CDNA4 and RDNA4 support for VAE#13411
Apophis3158 wants to merge 1 commit intoComfy-Org:masterfrom
Apophis3158:master/rocm

Conversation

@Apophis3158
Copy link
Copy Markdown

Better support for AMD latest GPU arches: RDNA4 (gfx1200, gfx1201) and CDNA4 (gfx950), determined based on fp8 support.

Reference: __hip_fp8_e4m3 and __hip_fp8_e5m2 supports in HIP C++ type implementation support table at AMD data types and precision support

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

The PR changes AMD-specific logic to depend on SUPPORT_FP8_OPS. In comfy/model_management.py, pytorch_attention_enabled_vae() now returns False for AMD only when SUPPORT_FP8_OPS is false; if SUPPORT_FP8_OPS is true, it defers to pytorch_attention_enabled(). In comfy/sd.py, VAE_KL_MEM_RATIO in VAE.__init__ is set to 2.73 only when is_amd() is true and SUPPORT_FP8_OPS is false; otherwise it uses the non-AMD value. No public signatures changed.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation for the changes and referencing AMD documentation about FP8 support.
Title check ✅ Passed The title accurately reflects the main objective of the PR: adding improved support for AMD GPU architectures CDNA4 and RDNA4 through conditional FP8 operations handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Apophis3158
Copy link
Copy Markdown
Author

  • Updated inline comments.

These two improvements has already been tested by ROCm users on Windows and Linux long time ago:

and more.

Most feedback is from gfx120x, so it's better to use SUPPORT_FP8_OPS for the restriction.

if torch_version_numeric >= (2, 7) and rocm_version >= (6, 4):
if any((a in arch) for a in ["gfx1200", "gfx1201", "gfx950"]): # TODO: more arches, "gfx942" gives error on pytorch nightly 2.10 1013 rocm7.0
SUPPORT_FP8_OPS = True

@Apophis3158 Apophis3158 changed the title [ROCm] better AMD CDNA4 and RDNA4 support [ROCm] better AMD CDNA4 and RDNA4 support for VAE Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant