We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent a9592ca commit 63fee4fCopy full SHA for 63fee4f
python/sglang/srt/layers/attention/hip_radix_attention.py
@@ -1,10 +1,8 @@
1
from __future__ import annotations
2
3
"""
4
-Support different attention backends.
5
-Now there are two backends: FlashInfer and Triton.
6
-FlashInfer is faster and Triton is easier to customize.
7
-Each backend supports two operators: extend (i.e. prefill with cached prefix) and decode.
+HiP Attention Backend for SGLang
+https://arxiv.org/pdf/2406.09827
8
9
10
import logging
0 commit comments