Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 18, 2025

Purpose

Fix #28604 (resolved by #28739) - re-enable FlashInfer as the attention backend for Llama4 on Blackwell platforms in e2e fusion tests.

Test Plan

Existing e2e fusion tests in tests/compile/test_fusions_e2e.py will validate the change:

  • test_attn_quant - Tests attention+quant fusion with FlashInfer on Blackwell
  • test_tp2_attn_quant_allreduce_rmsnorm - Tests multi-GPU fusion patterns
  • test_tp2_attn_quant_async_tp - Tests async TP with FlashInfer

Test Result

Changes are minimal and follow the existing pattern used for Llama3. The Llama4 model configuration now uses:

backend=AttentionBackendEnum.FLASHINFER if is_blackwell() else AttentionBackendEnum.TRITON_ATTN

This enables FlashInfer on Blackwell (where #28739 fixed the issue) while preserving TRITON_ATTN on Hopper (where #28568 remains unresolved).

Changes:


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
Original prompt

#28739 fixed an issue with llama4, can you please reenable flashinfer as the attention backend for llama4 on e2e fusion tests, and remove the issue links to #28604 which was fixed


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@mergify mergify bot added the llama Related to Llama models label Nov 18, 2025
Copilot AI changed the title [WIP] Reenable flashinfer as attention backend for llama4 Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests Nov 18, 2025
Copilot AI requested a review from ProExpertProg November 18, 2025 22:52
Copilot finished work on behalf of ProExpertProg November 18, 2025 22:52
Signed-off-by: Luka Govedič <[email protected]>
@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025
@ProExpertProg ProExpertProg marked this pull request as ready for review November 18, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Llama4 on B200 flashinfer produces garbage

2 participants