[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955

vllmellm · 2025-05-11T04:32:56Z

This PR add the use of biased_group_topk kernel for DeepSeekV3

Performance Comparison Summary

Metric	V0 Before	V0 After	Change (%)	V1 Before	V1 After	Change (%)
Benchmark Duration (s)	168.90	163.00	-3.5%	144.65	137.63	-4.9%
Request Throughput (req/s)	2.96	3.07	+3.7%	3.46	3.63	+4.9%
Output Token Throughput (tok/s)	836.47	951.74	+13.8%	954.81	918.66	-3.8%
Total Token Throughput (tok/s)	3757.01	3977.94	+5.9%	4365.06	4502.65	+3.2%
Mean TTFT (ms)	23508.33	21869.51	-7.0%	24366.97	22886.85	-6.1%
Median TTFT (ms)	24786.55	21003.01	-15.3%	24778.90	23169.19	-6.5%
P99 TTFT (ms)	40458.17	37903.93	-6.3%	43829.83	42134.00	-3.9%
Mean ITL (ms)	111.66	104.64	-6.3%	86.47	86.98	+0.6%
Median ITL (ms)	77.75	75.21	-3.3%	57.22	54.72	-4.4%
P99 ITL (ms)	101.53	87.36	-14.0%	2635.71	2577.23	-2.2%

Key Observations:

Both V0 and V1 show performance improvements after changes
V0 shows significant improvement in output token throughput (+13.8%)
V1 has better overall throughput metrics than V0 both before and after changes
Time to First Token (TTFT) improved in both versions, with V0 showing more significant improvement in median TTFT (-15.3%)
Inter-token Latency (ITL) improved in most metrics across both versions

Accuracy Validation

AITER V0

vllm (pretrained=deepseek-ai/DeepSeek-V3,tensor_parallel_size=8,max_model_len=32768,block_size=1,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9447	±	0.0063
		strict-match	5	exact_match	↑	0.9447	±	0.0063

AITER V1

vllm (pretrained=deepseek-ai/DeepSeek-V3,tensor_parallel_size=8,max_model_len=32768,block_size=1,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9484	±	0.0061
		strict-match	5	exact_match	↑	0.9462	±	0.0062

V1 custom op of registration unit tests

The unit tests tests for the following criteria

the custom ops are registered successfully
the fake tensor definition of the custom ops are correct
the output generated in non-torch compile and torch compile mode are the same

Signed-off-by: tjtanaa <[email protected]>

github-actions · 2025-05-11T04:33:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad · 2025-05-12T05:14:44Z

Fix the linter (maybe need rebase)?

Signed-off-by: tjtanaa <[email protected]>

tjtanaa · 2025-05-13T02:06:13Z

Fix the linter (maybe need rebase)?

@houseroad It has been fixed.

SageMoore · 2025-05-13T13:50:03Z

vllm/model_executor/layers/fused_moe/layer.py

-                topk_group=topk_group,
-                scoring_func=scoring_func,
-                e_score_correction_bias=e_score_correction_bias)
+            if (rocm_aiter_biased_grouped_topk


Can you add a rocm_aiter_biased_group_topk wrapper in rocm_aiter_fused_moe that has the same interface as grouped_topk? That way we can do something like:

if is_rocm_aiter_moe_enabled(): import torch.ops.vllm.rocm_aiter_biased_grouped_topk as grouped_topk else: from vllm.model_executor.layers.fused_moe.fused_moe import grouped_topk

At the top of the file? That would remove all of the dispatching logic here AFAICT. It looks like the only dead argument you would have is scoring_func which I guess is assumed by the AITER kernel? If so, asserting it's "softmax" inside the wrapper would be good.

Signed-off-by: tjtanaa <[email protected]>

mergify · 2025-05-13T16:28:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: tjtanaa <[email protected]>

…project#17955) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: tjtanaa <[email protected]>

…project#17955) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: tjtanaa <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]>

…project#17955) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: tjtanaa <[email protected]> Signed-off-by: Chenheli Hua <[email protected]>

…project#17955) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: tjtanaa <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

tjtanaa added 2 commits May 10, 2025 17:15

add rocm_aiter_biased_grouped_topk

41c9552

Signed-off-by: tjtanaa <[email protected]>

add aiter custom ops unit tests

1690a46

Signed-off-by: tjtanaa <[email protected]>

vllmellm requested review from tlrmchlsmth and WoosukKwon as code owners May 11, 2025 04:32

tjtanaa mentioned this pull request May 10, 2025

[Feature] [ROCm]: AITER Kernel Integration #14964

Open

45 tasks

vllmellm changed the title ~~[FEAT] [ROCm]: Add AITER biased group topk for DeepSeekV3~~ [FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 May 11, 2025

houseroad added the rocm Related to AMD ROCm label May 12, 2025

fix pre-commit

1f3a6cf

Signed-off-by: tjtanaa <[email protected]>

SageMoore reviewed May 13, 2025

View reviewed changes

tjtanaa added 2 commits May 13, 2025 16:15

change abstraction

ae28b71

Signed-off-by: tjtanaa <[email protected]>

fix precommit

cbd9a2a

Signed-off-by: tjtanaa <[email protected]>

mergify bot added the needs-rebase label May 13, 2025

fix pre-commit after sync with main as main chagned linting policy

f778d37

Signed-off-by: tjtanaa <[email protected]>

mergify bot removed the needs-rebase label May 13, 2025

SageMoore approved these changes May 13, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2025

vllm-bot merged commit 2d912fb into vllm-project:main May 14, 2025
84 of 88 checks passed

aarnphm pushed a commit to aarnphm/vllm that referenced this pull request May 15, 2025

[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (vllm-…

0c54320

…project#17955) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: tjtanaa <[email protected]>

tjtanaa deleted the aiter-biased-group-topk branch May 16, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955

[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955

Uh oh!

vllmellm commented May 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 11, 2025

Uh oh!

houseroad commented May 12, 2025

Uh oh!

tjtanaa commented May 13, 2025

Uh oh!

SageMoore May 13, 2025

Uh oh!

mergify bot commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955

[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955

Uh oh!

Conversation

vllmellm commented May 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison Summary

Accuracy Validation

V1 custom op of registration unit tests

Uh oh!

github-actions bot commented May 11, 2025

Uh oh!

houseroad commented May 12, 2025

Uh oh!

tjtanaa commented May 13, 2025

Uh oh!

SageMoore May 13, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

vllmellm commented May 11, 2025 •

edited by github-actions bot

Loading