Skip to content

Commit 0ad16d1

Browse files
luoxiaolin712asinkLuno
authored andcommitted
[Refactor]Refactor of vllm_ascend/distributed module (vllm-project#5719)
### What this PR does / why we need it? Based on the RFC:vllm-project#5604 This PR is a refactoring of vllm_ascend/distributed, moving all kv_transfer realtaed codes into a dedicated folder, which has already been done in vLLM ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: lty <[email protected]>
1 parent 749a48b commit 0ad16d1

File tree

56 files changed

+300
-293
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+300
-293
lines changed

docs/source/developer_guide/contribution/multi_node_test.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ This section assumes that you already have a [Kubernetes](https://kubernetes.io/
295295
[2025-12-30 11:01:01] INFO multi_node_config.py:348: Resolving cluster IPs via DNS...
296296
[2025-12-30 11:01:01] INFO multi_node_config.py:212: Node 0 envs: {'VLLM_USE_MODELSCOPE': 'True', 'OMP_PROC_BIND': 'False', 'OMP_NUM_THREADS': '100', 'HCCL_BUFFSIZE': '1024', 'SERVER_PORT': '8080', 'NUMEXPR_MAX_THREADS': '128', 'DISAGGREGATED_PREFILL_PROXY_SCRIPT': 'examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py', 'HCCL_IF_IP': '10.0.0.102', 'HCCL_SOCKET_IFNAME': 'eth0', 'GLOO_SOCKET_IFNAME': 'eth0', 'TP_SOCKET_IFNAME': 'eth0', 'LOCAL_IP': '10.0.0.102', 'NIC_NAME': 'eth0', 'MASTER_IP': '10.0.0.102'}
297297
[2025-12-30 11:01:01] INFO multi_node_config.py:159: Launching proxy: python examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py --host 10.0.0.102 --port 6000 --prefiller-hosts 10.0.0.102 --prefiller-ports 8080 --decoder-hosts 10.0.0.138 --decoder-ports 8080
298-
[2025-12-30 11:01:01] INFO conftest.py:107: Starting server with command: vllm serve vllm-ascend/DeepSeek-V3-W8A8 --host 0.0.0.0 --port 8080 --data-parallel-size 2 --data-parallel-size-local 2 --tensor-parallel-size 8 --seed 1024 --enforce-eager --enable-expert-parallel --max-num-seqs 16 --max-model-len 8192 --max-num-batched-tokens 8192 --quantization ascend --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.9 --kv-transfer-config {"kv_connector": "MooncakeConnectorV1", "kv_role": "kv_producer", "kv_port": "30000", "engine_id": "0", "kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector", "kv_connector_extra_config": {
298+
[2025-12-30 11:01:01] INFO conftest.py:107: Starting server with command: vllm serve vllm-ascend/DeepSeek-V3-W8A8 --host 0.0.0.0 --port 8080 --data-parallel-size 2 --data-parallel-size-local 2 --tensor-parallel-size 8 --seed 1024 --enforce-eager --enable-expert-parallel --max-num-seqs 16 --max-model-len 8192 --max-num-batched-tokens 8192 --quantization ascend --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.9 --kv-transfer-config {"kv_connector": "MooncakeConnectorV1", "kv_role": "kv_producer", "kv_port": "30000", "engine_id": "0", "kv_connector_extra_config": {
299299
"prefill": {
300300
"dp_size": 2,
301301
"tp_size": 8

docs/source/tutorials/DeepSeek-V3.1.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -328,7 +328,6 @@ vllm serve /weights/DeepSeek-V3.1-w8a8-mtp-QuaRot \
328328
"kv_role": "kv_producer",
329329
"kv_port": "30000",
330330
"engine_id": "0",
331-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
332331
"kv_connector_extra_config": {
333332
"prefill": {
334333
"dp_size": 2,
@@ -406,7 +405,6 @@ vllm serve /weights/DeepSeek-V3.1-w8a8-mtp-QuaRot \
406405
"kv_role": "kv_producer",
407406
"kv_port": "30100",
408407
"engine_id": "1",
409-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
410408
"kv_connector_extra_config": {
411409
"prefill": {
412410
"dp_size": 2,
@@ -484,7 +482,6 @@ vllm serve /weights/DeepSeek-V3.1-w8a8-mtp-QuaRot \
484482
"kv_role": "kv_consumer",
485483
"kv_port": "30200",
486484
"engine_id": "2",
487-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
488485
"kv_connector_extra_config": {
489486
"prefill": {
490487
"dp_size": 2,
@@ -562,7 +559,6 @@ vllm serve /weights/DeepSeek-V3.1-w8a8-mtp-QuaRot \
562559
"kv_role": "kv_consumer",
563560
"kv_port": "30300",
564561
"engine_id": "3",
565-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
566562
"kv_connector_extra_config": {
567563
"prefill": {
568564
"dp_size": 2,

docs/source/tutorials/DeepSeek-V3.2.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,6 @@ Before you start, please
294294
"kv_role": "kv_producer",
295295
"kv_port": "30000",
296296
"engine_id": "0",
297-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
298297
"kv_connector_extra_config": {
299298
"use_ascend_direct": true,
300299
"prefill": {
@@ -369,7 +368,6 @@ Before you start, please
369368
"kv_role": "kv_producer",
370369
"kv_port": "30000",
371370
"engine_id": "0",
372-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
373371
"kv_connector_extra_config": {
374372
"use_ascend_direct": true,
375373
"prefill": {
@@ -447,7 +445,6 @@ Before you start, please
447445
"kv_role": "kv_consumer",
448446
"kv_port": "30100",
449447
"engine_id": "1",
450-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
451448
"kv_connector_extra_config": {
452449
"use_ascend_direct": true,
453450
"prefill": {
@@ -526,7 +523,6 @@ Before you start, please
526523
"kv_role": "kv_consumer",
527524
"kv_port": "30100",
528525
"engine_id": "1",
529-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
530526
"kv_connector_extra_config": {
531527
"use_ascend_direct": true,
532528
"prefill": {

docs/source/tutorials/Qwen3-235B-A22B.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,6 @@ vllm serve vllm-ascend/Qwen3-235B-A22B-w8a8 \
442442
"kv_role": "kv_producer",
443443
"kv_port": "30000",
444444
"engine_id": "0",
445-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
446445
"kv_connector_extra_config": {
447446
"use_ascend_direct": true,
448447
"prefill": {
@@ -508,7 +507,6 @@ vllm serve vllm-ascend/Qwen3-235B-A22B-w8a8 \
508507
"kv_role": "kv_consumer",
509508
"kv_port": "30100",
510509
"engine_id": "1",
511-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
512510
"kv_connector_extra_config": {
513511
"use_ascend_direct": true,
514512
"prefill": {
@@ -575,7 +573,6 @@ vllm serve vllm-ascend/Qwen3-235B-A22B-w8a8 \
575573
"kv_role": "kv_consumer",
576574
"kv_port": "30100",
577575
"engine_id": "1",
578-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
579576
"kv_connector_extra_config": {
580577
"use_ascend_direct": true,
581578
"prefill": {

docs/source/tutorials/long_sequence_context_parallel_multi_node.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,6 @@ vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
123123
"kv_role": "kv_producer",
124124
"kv_port": "30000",
125125
"engine_id": "0",
126-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
127126
"kv_connector_extra_config": {
128127
"use_ascend_direct": true,
129128
"prefill": {
@@ -192,7 +191,6 @@ vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
192191
"kv_role": "kv_producer",
193192
"kv_port": "30000",
194193
"engine_id": "1",
195-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
196194
"kv_connector_extra_config": {
197195
"use_ascend_direct": true,
198196
"prefill": {
@@ -259,7 +257,6 @@ vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
259257
"kv_role": "kv_consumer",
260258
"kv_port": "30200",
261259
"engine_id": "3",
262-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
263260
"kv_connector_extra_config": {
264261
"prefill": {
265262
"dp_size": 1,

docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
280280
"kv_role": "kv_producer",
281281
"kv_port": "30000",
282282
"engine_id": "0",
283-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_layerwise_connector",
284283
"kv_connector_extra_config": {
285284
"prefill": {
286285
"dp_size": 2,
@@ -340,7 +339,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
340339
"kv_role": "kv_producer",
341340
"kv_port": "30100",
342341
"engine_id": "1",
343-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_layerwise_connector",
344342
"kv_connector_extra_config": {
345343
"prefill": {
346344
"dp_size": 2,
@@ -401,7 +399,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
401399
"kv_role": "kv_consumer",
402400
"kv_port": "30200",
403401
"engine_id": "2",
404-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_layerwise_connector",
405402
"kv_connector_extra_config": {
406403
"prefill": {
407404
"dp_size": 2,
@@ -461,7 +458,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
461458
"kv_role": "kv_consumer",
462459
"kv_port": "30200",
463460
"engine_id": "2",
464-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_layerwise_connector",
465461
"kv_connector_extra_config": {
466462
467463
"prefill": {
@@ -529,7 +525,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
529525
"kv_role": "kv_producer",
530526
"kv_port": "30000",
531527
"engine_id": "0",
532-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
533528
"kv_connector_extra_config": {
534529
"prefill": {
535530
"dp_size": 2,
@@ -589,7 +584,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
589584
"kv_role": "kv_producer",
590585
"kv_port": "30100",
591586
"engine_id": "1",
592-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
593587
"kv_connector_extra_config": {
594588
"prefill": {
595589
"dp_size": 2,
@@ -650,7 +644,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
650644
"kv_role": "kv_consumer",
651645
"kv_port": "30200",
652646
"engine_id": "2",
653-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
654647
"kv_connector_extra_config": {
655648
"prefill": {
656649
"dp_size": 2,
@@ -710,7 +703,6 @@ vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
710703
"kv_role": "kv_consumer",
711704
"kv_port": "30200",
712705
"engine_id": "2",
713-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
714706
"kv_connector_extra_config": {
715707
"prefill": {
716708
"dp_size": 2,

docs/source/tutorials/pd_disaggregation_mooncake_single_node.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
173173
"kv_role": "kv_producer",
174174
"kv_port": "30000",
175175
"engine_id": "0",
176-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
177176
"kv_connector_extra_config": {
178177
"prefill": {
179178
"dp_size": 1,
@@ -216,7 +215,6 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
216215
"kv_role": "kv_consumer",
217216
"kv_port": "30100",
218217
"engine_id": "1",
219-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
220218
"kv_connector_extra_config": {
221219
"prefill": {
222220
"dp_size": 1,

docs/source/user_guide/deployment_guide/using_volcano_kthena.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ spec:
137137
- "--trust-remote-code"
138138
- "--enforce-eager"
139139
- "--kv-transfer-config"
140-
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_producer","kv_parallel_size":1,"kv_port":"20001","engine_id":"0","kv_rank":0,"kv_connector_module_path":"vllm_ascend.distributed.mooncake_connector","kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
140+
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_producer","kv_parallel_size":1,"kv_port":"20001","engine_id":"0","kv_rank":0,"kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
141141
imagePullPolicy: Always
142142
resources:
143143
limits:
@@ -240,7 +240,7 @@ spec:
240240
- "--no-enable-prefix-caching"
241241
- "--enforce-eager"
242242
- "--kv-transfer-config"
243-
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_consumer","kv_parallel_size":1,"kv_port":"20002","engine_id":"1","kv_rank":1,"kv_connector_module_path":"vllm_ascend.distributed.mooncake_connector","kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
243+
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_consumer","kv_parallel_size":1,"kv_port":"20002","engine_id":"1","kv_rank":1,"kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
244244
imagePullPolicy: Always
245245
resources:
246246
limits:

docs/source/user_guide/feature_guide/large_scale_ep.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
163163
"kv_role": "kv_producer",
164164
"kv_parallel_size": "1",
165165
"kv_port": "20001",
166-
"engine_id": "0",
167-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
166+
"engine_id": "0"
168167
}'
169168
--additional-config '{"enable_weight_nz_layout":true,"enable_prefill_optimizations":true}'
170169
```
@@ -230,8 +229,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
230229
"kv_role": "kv_consumer",
231230
"kv_parallel_size": "1",
232231
"kv_port": "20001",
233-
"engine_id": "0",
234-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
232+
"engine_id": "0"
235233
}' \
236234
--additional-config '{"enable_weight_nz_layout":true}'
237235
```
@@ -435,8 +433,7 @@ In the PD separation scenario, we provide a optimized configuration.
435433
"kv_role": "kv_producer",
436434
"kv_parallel_size": "1",
437435
"kv_port": "20001",
438-
"engine_id": "0",
439-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
436+
"engine_id": "0"
440437
}'
441438
```
442439

@@ -458,8 +455,7 @@ In the PD separation scenario, we provide a optimized configuration.
458455
"kv_role": "kv_consumer",
459456
"kv_parallel_size": "1",
460457
"kv_port": "20001",
461-
"engine_id": "0",
462-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
458+
"engine_id": "0"
463459
}'
464460
```
465461

examples/disaggregated_prefill_v1/mooncake_connector_deployment_guide.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,6 @@ vllm serve "/xxxxx/DeepSeek-V2-Lite-Chat" \
5555
"kv_port": "20001",
5656
"engine_id": "0",
5757
"kv_rank": 0,
58-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
5958
"kv_connector_extra_config": {
6059
"prefill": {
6160
"dp_size": 2,
@@ -123,7 +122,6 @@ vllm serve "/xxxxx/DeepSeek-V2-Lite-Chat" \
123122
"kv_port": "20002",
124123
"engine_id": "1",
125124
"kv_rank": 1,
126-
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
127125
"kv_connector_extra_config": {
128126
"prefill": {
129127
"dp_size": 2,

0 commit comments

Comments
 (0)