Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions vllm_ascend/worker/model_runner_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
# Adapted from vllm-project/vllm/vllm/worker/gpu_model_runner.py
#

import ctypes
import math
import sys
from collections import defaultdict
Expand Down Expand Up @@ -131,6 +132,10 @@

SEQ_LEN_WITH_MAX_PA_WORKSPACE = 6144

# TODO: remove this after python update to 3.12
NONE_REF_COUNT_THRESHOLDS = 500000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The value 500000 for NONE_REF_COUNT_THRESHOLDS appears to be a magic number. Given that this is part of a critical workaround to prevent a crash, it's important to document how this threshold was determined. Please add a comment explaining the rationale behind this specific value. This context is crucial for future maintenance and understanding the conditions under which this workaround is triggered.

U32_MAX = 0xFFFFFFFF


@dataclass
class GraphCaptureContext:
Expand Down Expand Up @@ -1654,6 +1659,11 @@ def sample_tokens(
kv_connector_output = self.kv_connector_output
self.kv_connector_output = None

# TODO: remove this after python update to 3.12
refcountNone = sys.getrefcount(None)
if refcountNone < NONE_REF_COUNT_THRESHOLDS:
ctypes.c_long.from_address(id(None)).value = U32_MAX - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Directly manipulating the None object's reference count is extremely dangerous and non-portable. It relies on CPython's internal memory layout, which can change between versions.

While this is a workaround for a deeper reference counting bug, it can be made slightly safer and more portable by using ctypes.c_ssize_t instead of ctypes.c_long. The ob_refcnt field of a Python object is of type Py_ssize_t. ctypes.c_long is platform-dependent (e.g., it's 32-bit on 64-bit Windows), whereas ctypes.c_ssize_t correctly corresponds to Py_ssize_t on all platforms.

This change doesn't fix the underlying issue but reduces the risk of platform-specific failures. The root cause of the excessive Py_DECREF on None should still be investigated as a high priority.

Suggested change
ctypes.c_long.from_address(id(None)).value = U32_MAX - 1
ctypes.c_ssize_t.from_address(id(None)).value = U32_MAX - 1


if self.execute_model_state is None:
# Nothing to do (PP non-final rank case), output isn't used.
if not kv_connector_output:
Expand Down
Loading