-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
Problem Description
We encountered this issue on SGLang.
python3 -m sglang.bench_one_batch --batch-size 256 --input-len 1024 --output-len 1024 --model-path deepseek-ai/DeepSeek-V3.1 --load-format dummy --tp 8
[2025-10-17 23:21:26 TP4] shape is M:262144, N:512, K:7168, not found tuned config in CKGEMM, will use default config!
[aiter] shape is M:262144, N:7168, K:256, not found tuned config in CKGEMM, will use default config!
[2025-10-17 23:21:26 TP4] shape is M:262144, N:7168, K:256, not found tuned config in CKGEMM, will use default config!
[aiter] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[2025-10-17 23:21:26 TP6] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[aiter] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[2025-10-17 23:21:26 TP7] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[aiter] shape is M:262144, N:512, K:7168, not found tuned config in CKGEMM, will use default config!
[2025-10-17 23:21:26 TP5] shape is M:262144, N:512, K:7168, not found tuned config in CKGEMM, will use default config!
[aiter] shape is M:262144, N:7168, K:256, not found tuned config in CKGEMM, will use default config!
[2025-10-17 23:21:26 TP5] shape is M:262144, N:7168, K:256, not found tuned config in CKGEMM, will use default config!
[aiter] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[2025-10-17 23:21:26 TP4] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[aiter] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[2025-10-17 23:21:26 TP5] type hints mismatch, override to --> moe_fused_gate(input: torch.Tensor, bias: torch.Tensor, topk_weights: torch.Tensor, topk_ids: torch.Tensor, num_expert_group: int, topk_group: int, topk: int, n_share_experts_fusion: int, routed_scaling_factor: float = 1.0) -> list[torch.Tensor]
[aiter] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[2025-10-17 23:21:26 TP6] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[AITER] /sgl-workspace/aiter/aiter/jit/build/module_moe_asm/build/srcs/asm_fmoe.hip:250 fail to call hipModuleLaunchKernel( kernel_func, gdx, gdy, gdz, bdx, 1, 1, 0, stream, nullptr, (void**)&config) ---> [HIP error](invalid argument)
[aiter] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[2025-10-17 23:21:26 TP4] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[AITER] /sgl-workspace/aiter/aiter/jit/build/module_moe_asm/build/srcs/asm_fmoe.hip:250 fail to call hipModuleLaunchKernel( kernel_func, gdx, gdy, gdz, bdx, 1, 1, 0, stream, nullptr, (void**)&config) ---> [HIP error](invalid argument)
[aiter] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[2025-10-17 23:21:26 TP5] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[AITER] /sgl-workspace/aiter/aiter/jit/build/module_moe_asm/build/srcs/asm_fmoe.hip:250 fail to call hipModuleLaunchKernel( kernel_func, gdx, gdy, gdz, bdx, 1, 1, 0, stream, nullptr, (void**)&config) ---> [HIP error](invalid argument)
[aiter] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[2025-10-17 23:21:26 TP7] [fused_moe] using 1stage (kernelName1='_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E', kernelName2='Null') for (256, 1024, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fn', 'torch.float8_e4m3fn', 'QuantType.per_1x128', True, False)
[AITER] /sgl-workspace/aiter/aiter/jit/build/module_moe_asm/build/srcs/asm_fmoe.hip:250 fail to call hipModuleLaunchKernel( kernel_func, gdx, gdy, gdz, bdx, 1, 1, 0, stream, nullptr, (void**)&config) ---> [HIP error](invalid argument)
Operating System
NAME="Ubuntu" VERSION="22.04.5 LTS (Jammy Jellyfish)"
CPU
AMD EPYC 9575F 64-Core Processor
GPU
MI355X x 8
ROCm Version
7.0.0
ROCm Component
No response
Steps to Reproduce
Use the latest SGLang Docker for ROCm. I found this issue is encountered at BS > 256
python3 -m sglang.bench_one_batch --batch-size 256 --input-len 1024 --output-len 1024 --model-path deepseek-ai/DeepSeek-V3.1 --load-format dummy --tp 8
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Metadata
Metadata
Assignees
Labels
No labels