Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests #28966

Copilot · 2025-11-18T22:45:08Z

Purpose

Fix #28604 (resolved by #28739) - re-enable FlashInfer as the attention backend for Llama4 on Blackwell platforms in e2e fusion tests.

Test Plan

Existing e2e fusion tests in tests/compile/test_fusions_e2e.py will validate the change:

test_attn_quant - Tests attention+quant fusion with FlashInfer on Blackwell
test_tp2_attn_quant_allreduce_rmsnorm - Tests multi-GPU fusion patterns
test_tp2_attn_quant_async_tp - Tests async TP with FlashInfer

Test Result

Changes are minimal and follow the existing pattern used for Llama3. The Llama4 model configuration now uses:

backend=AttentionBackendEnum.FLASHINFER if is_blackwell() else AttentionBackendEnum.TRITON_ATTN

This enables FlashInfer on Blackwell (where #28739 fixed the issue) while preserving TRITON_ATTN on Hopper (where #28568 remains unresolved).

Changes:

Llama4 backend: conditional FlashInfer on Blackwell, TRITON_ATTN otherwise
Removed TODO and [Bug]: Llama4 on B200 flashinfer produces garbage #28604 reference (fixed)
Cleaned up stale Llama3 workaround comment
Preserved [Bug]: FlashInfer attention backend on Hopper fails with llama4-scout and llama3 with fp8 kvcache #28568 TODO (Hopper kvcache=fp8 issue still open)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Original prompt

#28739 fixed an issue with llama4, can you please reenable flashinfer as the attention backend for llama4 on e2e fusion tests, and remove the issue links to #28604 which was fixed

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…28604 reference Co-authored-by: ProExpertProg <[email protected]>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <[email protected]>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <[email protected]>

Initial plan

af20a8f

Copilot AI assigned Copilot and ProExpertProg Nov 18, 2025

Copilot started work on behalf of ProExpertProg November 18, 2025 22:45 View session

mergify bot added the llama Related to Llama models label Nov 18, 2025

Re-enable flashinfer for llama4 on Blackwell and remove fixed issue #…

e3b5882

…28604 reference Co-authored-by: ProExpertProg <[email protected]>

Copilot AI changed the title ~~[WIP] Reenable flashinfer as attention backend for llama4~~ Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests Nov 18, 2025

Copilot AI requested a review from ProExpertProg November 18, 2025 22:52

Copilot finished work on behalf of ProExpertProg November 18, 2025 22:52

ProExpertProg reviewed Nov 18, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

Apply suggestion from @ProExpertProg

c9d4d7a

Signed-off-by: Luka Govedič <[email protected]>

ProExpertProg reviewed Nov 18, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

make llama3 triton again

654ebbf

Signed-off-by: Luka Govedič <[email protected]>

ProExpertProg approved these changes Nov 18, 2025

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025

ProExpertProg marked this pull request as ready for review November 18, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests #28966

Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests #28966

Copilot AI commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests #28966

Are you sure you want to change the base?

Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests #28966

Conversation

Copilot AI commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Changes:

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 18, 2025 •

edited

Loading