diff --git a/_posts/2024-08-07-flexattention.md b/_posts/2024-08-07-flexattention.md index 4c34879d33b6..4f31d1cbd6d1 100644 --- a/_posts/2024-08-07-flexattention.md +++ b/_posts/2024-08-07-flexattention.md @@ -1,7 +1,7 @@ --- layout: blog_detail title: "FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention" -author: "Team PyTorch: Horace He, Driss Guessous, Yanbo Liang, Joy Dong" +author: "Team PyTorch: Driss Guessous, Yanbo Liang, Joy Dong, Horace He" --- ![a cartoon chart flexing his muscles](/assets/images/flexattention/fg1.jpg){:style="width:100%"} @@ -439,6 +439,16 @@ FlexAttention achieves 90% of FlashAttention2's performance in the forward pass ![flexattention speed chart](/assets/images/flexattention/fg16.png){:style="width:100%"} +FlexAttention shines on H100 GPUs, where it's not just natively supported - it actually outperforms FlashAttention2! While it doesn't quite reach the heights of FlashAttention3, FlexAttention still packs a punch: + +- Forward pass: 85% of FlashAttention3's performance +- Backward pass: 76% of FlashAttention3's performance + +![flexattention speed chart](/assets/images/flexattention/fg17.png){:style="width:100%"} +![flexattention speed chart](/assets/images/flexattention/fg18.png){:style="width:100%"} + + + ## Conclusion We hope you have as much fun using FlexAttention as we did developing it\! While working on this, we ended up finding way more applications of this API than we could have expected. We’ve already seen it accelerate torchtune’s [sample packing throughput by 71%](https://github.com/pytorch/torchtune/pull/1193), replace the need for a researcher to spend over a week writing their own custom Triton kernel, and deliver competitive performance with custom handwritten attention variants. diff --git a/assets/images/flexattention/fg17.png b/assets/images/flexattention/fg17.png new file mode 100644 index 000000000000..9ff13faa0052 Binary files /dev/null and b/assets/images/flexattention/fg17.png differ diff --git a/assets/images/flexattention/fg18.png b/assets/images/flexattention/fg18.png new file mode 100644 index 000000000000..e1060876b12f Binary files /dev/null and b/assets/images/flexattention/fg18.png differ