-
Notifications
You must be signed in to change notification settings - Fork 128
Mla splitkv enhance split alg inte #1233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
valarLip
wants to merge
130
commits into
main
Choose a base branch
from
mla_splitkv_enhance_split_alg_inte
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
130 commits
Select commit
Hold shift + click to select a range
fa2c2d2
add num_kv_splits_indptr to mla for mtp<=4 case for now
valarLip 15f6155
update
valarLip 8dd5617
update new kernel
valarLip c871e8d
infrastructures
ruanjm 3750b5f
1st version of split kernel
ruanjm 7ca2598
Fix issues raised by Lingpeng and fix the issue on batch_size
ruanjm 7c5891c
update mla
valarLip 12def78
update mla_stage2
valarLip 5dc5a6d
Merge branch 'main' into mla_splitkv_enhance
valarLip eae14ae
Merge branch 'main' into mla_splitkv_enhance
valarLip f244f11
Merge branch 'mla_splitkv_enhance' into jruan/mla_splitkv_enhance_spl…
ruanjm 224f89f
1st draft of v1 split program
ruanjm ef442fd
add kv_offset
ruanjm f10235e
mla_splitkv_enhance_split_alg_inte
Zzz9990 600b5dd
splitkv debug
Zzz9990 5c58ae8
1st version of reduce kernel
ruanjm 9700bc5
metadata & kernel finish
Zzz9990 4a86304
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 d49c0cd
add reduce
Zzz9990 e4bf891
final_lse is optional now.
ruanjm 7bf6aa4
update kernel
Zzz9990 2411f1f
bug fix
ruanjm e21600d
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 ffcc113
bug fix 1
ruanjm 07e4ed1
modify reduce api
Zzz9990 3f2bf25
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 7c877c4
update kernel
Zzz9990 d10cdab
fix max splits
Zzz9990 bac5750
bug fix 3
ruanjm f59a3e6
fix s80 early return
Zzz9990 1ae58d1
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 5680c26
udpate calculation of partial_indx
ruanjm fa87c91
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 0dad74c
add per split test
Zzz9990 a8fa0b1
make lse support by ref
ruanjm 56e964f
test split
Zzz9990 a76610a
fix redundant calculation of head offset in reduce kernel
ruanjm 4ffd393
add custom test
Zzz9990 b3747df
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 ba36541
Add support of 128 head size
ruanjm e5a1b17
update comments
ruanjm a68879c
1. Let large work be assigned first.
ruanjm 7209c36
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 4494b36
Calculate kv_limit dynamically
ruanjm 09c4ca8
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 1e5e71a
Fix bug about difference in split_kv(bool)
ruanjm f35cf04
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 f7cf2b9
add test
Zzz9990 5b91267
fix seed
Zzz9990 59af206
Add global tolerance 16 in kv seqlen because main kernel cannot handl…
ruanjm e1b9065
Fix warp=1 error
ruanjm 2adf050
Add redundant mode to make the size of output of metadata be fixed ad…
ruanjm c0df46b
Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…
Zzz9990 fbff664
fp8 setup
Zzz9990 1d36311
first version of device metadata
ruanjm 4212a41
Add work_ptrs
ruanjm 818229e
Compatibility to CUDA Graph
ruanjm 704324a
Refactor code. Merge 2 iterations of generate work together.
ruanjm 6be798a
Make sure that each batch of workload can never be splited to more th…
ruanjm 1b0e26f
Adjust metadata. Get 1% perf gain.
ruanjm 36e9b53
Paralize most of metadata kernel
ruanjm 4403c82
add scale
Zzz9990 fcb36f0
1. Use warp-level bitonic sort to sort batch idx based on their cost …
ruanjm 5dc1eb7
fp8 function pass
Zzz9990 b46a8e3
Fix issues:
ruanjm d8d92bc
fp8 ready
Zzz9990 ead163a
fix
Zzz9990 7fefc29
Merge remote-tracking branch 'origin/jruan/mla_splitkv_enhance_split_…
Zzz9990 cc7ffdc
persistent ready
Zzz9990 5e32d5d
add nv acc test
Zzz9990 a97fcf8
rename
Zzz9990 e0c72f8
updata metashape
Zzz9990 7220b04
update reduce cu num
Zzz9990 07bf6bb
update optest for mla
Zzz9990 3a7bd04
fix cu num
Zzz9990 88c8a0d
Update metadata and reduce kernels.
ruanjm 7f86b0b
rename kernels
Zzz9990 018798d
Add new param kv_granularity to metadata kernel.
ruanjm 3bf1623
Introduce cal_workload_limit_global_v2
ruanjm 907dbed
Support qhead=128 cases.
ruanjm b2bed66
Change get_mla_metadata() api. Make some not important parameters be …
ruanjm a658ad8
fix potential problem on calculating tot_qo_tiles
ruanjm 325e03f
refactor metadata files
ruanjm 7072d90
update metadata v1_2
Zzz9990 851a888
update gqa_128 mla_ps & fix metadata v1_2
Zzz9990 b56eb25
Optimize mla metadata v1.2
ruanjm 8ea8f73
Optimize mla metadata v1.2 Part.2
ruanjm 9020ce8
Optimize mla metadata v1.2 Part.3
ruanjm 59d8e33
update qlen <=4
Zzz9990 b401744
fix mla qlen1
Zzz9990 3bf8b2b
Optimize mla metadata v1.2 Part.4
ruanjm 3f376b5
Make reduce_final_map be optional in mla_reduce_v1
ruanjm 7c865a5
Slightly increase reduce perf
ruanjm 8a17f56
Add persistent mode for mla reduce kernel
ruanjm 75ebf74
add mla_a16w8_qh16_m16x4_n16x1_coex0_mask1_ps.co
fangche123 3f67dbe
update deepseekv32 sparse mla metadata
Zzz9990 84e9616
update mla_a16w8_qh16_m16x4_n16x1_coex0_mask1_ps.co
fangche123 ce9096f
Adjust code for sparse attn
ruanjm 71abd03
Optimize the a16w8 kernel
fangche123 ebb2591
Improve metadata v1.1 perf
ruanjm 9afba8f
Make metadata v1.1 support sparse attn
ruanjm 2150d8f
Remove redundant code in mla_reduce
ruanjm 363707f
futile struggle
ruanjm 0b874cb
Merge branch 'main' into mla_splitkv_enhance_split_alg_inte
ruanjm ce9abd8
Fix issue after merge. aiter main branch is using torch.library.infer…
ruanjm 64c3e29
Adjust metadata v1.1 and make this branch be ready to be merged to ma…
ruanjm 57b9d57
Merge branch 'main' into mla_splitkv_enhance_split_alg_inte
ruanjm b70d8d4
remove invalid co kernel
Zzz9990 f668d60
Fix issue brought from f794ae4 which disabled hipify by default.
ruanjm 33ea0e8
support qolen>1 for sparse mla
Zzz9990 6e2c4ff
make code become prettier
ruanjm c3813fb
Fix issue in metadata v1.1
ruanjm bcd219a
Merge branch 'main' into mla_splitkv_enhance_split_alg_inte
ruanjm 33b0499
Fix issue in test_mla.py
ruanjm 53f5826
Fix lint fails
ruanjm 41576e1
Fix sub-test fails in op_test/test_mla.py
ruanjm 68ef089
Fix regression in test_mla.py where mtp>1
ruanjm f7efe97
Add head_dim=128 support to reduce
ruanjm 8440195
Merge branch 'main' into mla_splitkv_enhance_split_alg_inte
ruanjm 1c5b77b
Add nhead=8 for pa and add assert to make sure the input tensors are in
ruanjm 69d41a0
fix issue in vllm benchmark for deepseek: remove metadata v0 because …
ruanjm 0cf3db2
fix lint
ruanjm ae96787
Revert all the change about mi350 gemm.
ruanjm be55ef5
add a8w8 and a16w8 kernel in mla mi350
fangche123 600d993
add A8W8 Non-persistent mode kernel
fangche123 6c7f795
Fix issue reported by Copilot
ruanjm 573c3cd
add mla non-persistent test
fangche123 0cfc1a3
script: update a16w8 kernel
fangche123 0490f21
rm test_mla_persistent_mi350.py and support mi350 in test_mla_persist…
fangche123 8ca7679
Merge branch 'main' into mla_splitkv_enhance_split_alg_inte
valarLip File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.