Enable custom op and avoid graph breaks (#740) #1228

divakar-amd · 2025-10-21T01:59:17Z

This PR cherry-picks this PR from aiter main branch to update the custom_op logic in aiter. It resolves the following error in DeepSeek-R1 due to improper handling of a quant op

Run Cmd:

VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_MOE=0 vllm bench latency --model /data/DeepSeek-R1-dontUseDebugOnly/ --dtype auto --batch-size 32 --input-len 128 --output-len 32 -tp 8 --compilation-config='{"pass_config":{"enable_attn_fusion":true,"enable_noop":true,"enable_fusion":true},"cudagraph_mode":"FULL","custom_ops":["+rms_norm","+silu_and_mul","+quant_fp8"],"splitting_ops":[]}' --trust-remote-code --max-model-len=32768 --block-size=1 --num-iters-warmup 1 --num-iters 3 --load-format dummy

Error

RuntimeError: Worker failed with error 'Attempted to call function marked as skipped

File "/Projects/VLLM_DIR/vllm_upstream/vllm/model_executor/layers/mla.py", line 126, in forward_native
qkv_lora = self.fused_qkv_a_proj(hidden_states)[0]
File "/Projects/VLLM_DIR/vllm_upstream/vllm/model_executor/layers/linear.py", line 565, in forward
output_parallel = self.quant_method.apply(self, input_, bias)
File "/Projects/VLLM_DIR/vllm_upstream/vllm/model_executor/layers/quantization/fp8.py", line 666, in apply
return self.w8a8_block_fp8_linear.apply(
File "/Projects/VLLM_DIR/vllm_upstream/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 296, in apply
output = self.w8a8_blockscale_op(input_2d, weight, weight_scale)
File "/Projects/VLLM_DIR/vllm_upstream/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 355, in_run_aiter
q_input, input_scale = aiter_per1x128_quant(
File "/Projects/VLLM_DIR/aiter_ssh/aiter/ops/quant.py", line 236, in per_group_quant_hip
dynamic_per_token_scaled_quant(
File "/Projects/VLLM_DIR/aiter_ssh/aiter/jit/core.py", line 513, in wrapper
module = get_module(md)
File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/polyfills/init.py", line 259, in getattr_and_trace
return fn(*args[2:], **kwargs)

* first commit * revert some conflict * revert some conflict2 * support custom op define schema for some ops * support some of op return None value * support gemm return None * support other op for custom * commit on mha, gemm, moe * fix pa test * commit for enable op * add mha op multi return support * support reduce * support mha fwd * add support mha fwd and mha_v3 * support mhd bwd and reformat files * fix ci error and support mha * rewrite ops * reformat * fix ci * fix ci * skip three ops in custom * add cpu backend * support rms_norm op * support hipb_mm and moe gate * fix bug * fix bug with comment * support mha_v3_varlen * use common func to reduce code in mha * reformat * fix some bug in ci * fix some bug in ci * fix rms norm bug * fix ci * fix ci * fix moe bug

divakar-amd · 2025-10-21T01:59:36Z

@gshtras

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable custom op and avoid graph breaks (#740) #1228

Enable custom op and avoid graph breaks (#740) #1228

Uh oh!

divakar-amd commented Oct 21, 2025

Uh oh!

divakar-amd commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable custom op and avoid graph breaks (#740) #1228

Are you sure you want to change the base?

Enable custom op and avoid graph breaks (#740) #1228

Uh oh!

Conversation

divakar-amd commented Oct 21, 2025

Uh oh!

divakar-amd commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants