[BugFix] NetLoader: No backend type associated with device type npu #5700

destinysky · 2026-01-07T13:17:48Z

What this PR does / why we need it?
This PR fixes a bug in NetLoader PR#2888. The bug was caused by PR#3612 ([1/N][Refactor] Refactor code to adapt with vllm main), which removed the stateless_init_device_torch_dist_pg function from platform.py, leading to a failure in the call. This PR adds a way to create a stateless process group that does not depend on external code.

Does this PR introduce any user-facing change?
No

How was this patch tested?
Same with PR#2888

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

What this PR does / why we need it? This PR fixes a bug in NetLoader PR#2888 (vllm-project#2888). The bug was caused by PR#3612 ([1/N][Refactor] Refactor code to adapt with vllm main) (vllm-project#3612), which removed the `stateless_init_device_torch_dist_pg` function from platform.py, leading to a failure in the call. This PR adds a way to create a stateless process group that does not depend on external code. Does this PR introduce any user-facing change? No How was this patch tested? Same with PR#2888 Signed-off-by: destinysky <[email protected]>

github-actions · 2026-01-07T13:17:59Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a bug in NetLoader caused by a previous refactoring that removed necessary process group initialization logic. The fix introduces a new file, vllm_ascend/model_loader/netloader/executor/netloader_pg.py, which contains self-contained functions (stateless_init_process_group and destroy_stateless_process_group) for managing stateless HCCL process groups on NPU devices. The changes in elastic_load.py simply adopt these new utility functions. My review focuses on the new implementation in netloader_pg.py, where I've found a couple of areas for improvement regarding code correctness and robustness.

vllm_ascend/model_loader/netloader/executor/netloader_pg.py

Signed-off-by: destinysky <[email protected]>

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

vllm_ascend/model_loader/netloader/executor/netloader_pg.py Outdated Show resolved Hide resolved

vllm_ascend/model_loader/netloader/executor/netloader_pg.py Outdated Show resolved Hide resolved

Format Fixing

25d7cab

Signed-off-by: destinysky <[email protected]>

destinysky closed this Jan 8, 2026

destinysky reopened this Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] NetLoader: No backend type associated with device type npu #5700

[BugFix] NetLoader: No backend type associated with device type npu #5700

Uh oh!

destinysky commented Jan 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[BugFix] NetLoader: No backend type associated with device type npu #5700

Are you sure you want to change the base?

[BugFix] NetLoader: No backend type associated with device type npu #5700

Uh oh!

Conversation

destinysky commented Jan 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

destinysky commented Jan 7, 2026 •

edited by github-actions bot

Loading