Fix attention bias broadcast #24017

tianleiwu · 2025-03-13T02:41:47Z

Description

Fix broadcast on attention bias dim 1.
Increase test cases in test_mha.py in pipeline to cover the testing.

Motivation and Context

This feature was added in #21710.

There was bug when computing the offset when attention bias broadcast on dim 1 only in both CUDA and CPU kernel.

It can be triggered when attention bias shape is like [batch_size, 1, sequence_length, total_sequence_length] and batch_size > 1 when unfused kernel is selected. Note that cudnn flash attention and cutlass fused attention also supports attention bias, so the bug in unfused kernel was not discovered previously.

onnxruntime/test/python/transformers/test_mha.py

tianleiwu added 2 commits March 13, 2025 02:41

fix attention bias broadcast on dim 1

5226307

naming style

426b137

tianleiwu requested review from RyanUnderhill and kunal-vaishnavi March 13, 2025 03:44

RyanUnderhill reviewed Mar 13, 2025

View reviewed changes

onnxruntime/test/python/transformers/test_mha.py Outdated Show resolved Hide resolved

onnxruntime/test/python/transformers/test_mha.py Outdated Show resolved Hide resolved

review feedback

de9798a

tianleiwu requested a review from RyanUnderhill March 13, 2025 04:54

kunal-vaishnavi approved these changes Mar 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix attention bias broadcast #24017

Fix attention bias broadcast #24017

tianleiwu commented Mar 13, 2025 •

edited

Loading

Fix attention bias broadcast #24017

Are you sure you want to change the base?

Fix attention bias broadcast #24017

Conversation

tianleiwu commented Mar 13, 2025 • edited Loading

Description

Motivation and Context

tianleiwu commented Mar 13, 2025 •

edited

Loading