Skip to content

Conversation

@1092626063
Copy link
Contributor

What this PR does / why we need it?

work around: reset the None reference count to prevent it from dropping to zero and causing a crash.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround to prevent a crash caused by the reference count of Python's None object dropping to a dangerously low level. The change involves directly manipulating the None object's reference count in memory using ctypes if it falls below a certain threshold.

While this may be a necessary measure to address an underlying bug (which the comments suggest is fixed in Python 3.12), it is an extremely high-risk approach. My review focuses on the critical nature of this hack and recommends adding extensive documentation to explain the root cause and provide context for future maintainers.

Comment on lines 1661 to 1664
# TODO: remove this after python update to 3.12
refcountNone = sys.getrefcount(None)
if refcountNone < NONE_REF_COUNT_THRESHOLDS:
ctypes.c_long.from_address(id(None)).value = refcountNone + NONE_REF_COUNT_THRESHOLDS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a very dangerous workaround that directly manipulates the reference count of the None object. This relies on internal CPython implementation details (the memory layout of PyObject), which can break with different Python versions or builds, potentially leading to memory corruption or crashes.

While I understand this might be a necessary evil to prevent a crash due to a reference counting bug that is apparently fixed in Python 3.12, this code block needs more extensive documentation.

Please add a more detailed comment explaining:

  • The root cause of the None reference count dropping.
  • A link to the upstream bug report (in CPython or another library) that describes the issue and its fix in Python 3.12.
  • How the threshold 500000 was chosen.
  • Any other context that would help a future developer understand why this high-risk code is necessary and when it can be safely removed.

This context is crucial for maintainability and to prevent accidental breakage in the future.

…ng to zero and causing a crash.

Signed-off-by: 1092626063 <[email protected]>
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 30, 2026
@wangxiyuan
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround to prevent a crash caused by the reference count of the None object dropping to a low value. It does this by directly manipulating the reference count in memory using ctypes if it falls below a certain threshold.

While this may prevent an immediate crash, this approach is extremely dangerous as it relies on CPython's internal implementation details and can lead to instability. My review includes critical feedback on this workaround, suggesting a more portable implementation of the hack and emphasizing the need to find and fix the root cause of the reference counting issue. This workaround should be treated with extreme caution.

# TODO: remove this after python update to 3.12
refcountNone = sys.getrefcount(None)
if refcountNone < NONE_REF_COUNT_THRESHOLDS:
ctypes.c_long.from_address(id(None)).value = U32_MAX - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Directly manipulating the None object's reference count is extremely dangerous and non-portable. It relies on CPython's internal memory layout, which can change between versions.

While this is a workaround for a deeper reference counting bug, it can be made slightly safer and more portable by using ctypes.c_ssize_t instead of ctypes.c_long. The ob_refcnt field of a Python object is of type Py_ssize_t. ctypes.c_long is platform-dependent (e.g., it's 32-bit on 64-bit Windows), whereas ctypes.c_ssize_t correctly corresponds to Py_ssize_t on all platforms.

This change doesn't fix the underlying issue but reduces the risk of platform-specific failures. The root cause of the excessive Py_DECREF on None should still be investigated as a high priority.

Suggested change
ctypes.c_long.from_address(id(None)).value = U32_MAX - 1
ctypes.c_ssize_t.from_address(id(None)).value = U32_MAX - 1

SEQ_LEN_WITH_MAX_PA_WORKSPACE = 6144

# TODO: remove this after python update to 3.12
NONE_REF_COUNT_THRESHOLDS = 500000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The value 500000 for NONE_REF_COUNT_THRESHOLDS appears to be a magic number. Given that this is part of a critical workaround to prevent a crash, it's important to document how this threshold was determined. Please add a comment explaining the rationale behind this specific value. This context is crucial for future maintenance and understanding the conditions under which this workaround is triggered.

@wangxiyuan wangxiyuan merged commit 1a390d3 into vllm-project:releases/v0.13.0 Jan 30, 2026
25 of 26 checks passed
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
vllm-project#6441)

### What this PR does / why we need it?
work around: reset the None reference count to prevent it from dropping
to zero and causing a crash.

Signed-off-by: 1092626063 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants