Skip to content

[BMG][FlexAttn] Unbelievably high perf with DLE 2025.3 #5572

@whitneywhtsang

Description

@whitneywhtsang

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/19744824959/job/56576886398

    Z  H_q  H_kv  N_CTX_q  N_CTX_kv  D_HEAD_qk  D_HEAD_v MODE  Triton-GB/s  Torch-GB/s  Triton-GB/s-min  Torch-GB/s-min  Triton-GB/s-max  Torch-GB/s-max  Triton-TFlops  Torch-TFlops  Triton-TFlops-min  Torch-TFlops-min  Triton-TFlops-max  Torch-TFlops-max  Triton-CV  Torch-CV
4   1  128     1      512      1664        576       512  fwd   307.910042    9.630185        18.468514        7.563575       309.257742        9.659267     461.752076     14.441744          27.695993         11.342587         463.773132         14.485356   0.001911  0.002321

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions