-
Notifications
You must be signed in to change notification settings - Fork 789
work around: reset the None reference count to prevent it from droppi… #6441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
work around: reset the None reference count to prevent it from droppi… #6441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a workaround to prevent a crash caused by the reference count of Python's None object dropping to a dangerously low level. The change involves directly manipulating the None object's reference count in memory using ctypes if it falls below a certain threshold.
While this may be a necessary measure to address an underlying bug (which the comments suggest is fixed in Python 3.12), it is an extremely high-risk approach. My review focuses on the critical nature of this hack and recommends adding extensive documentation to explain the root cause and provide context for future maintainers.
| # TODO: remove this after python update to 3.12 | ||
| refcountNone = sys.getrefcount(None) | ||
| if refcountNone < NONE_REF_COUNT_THRESHOLDS: | ||
| ctypes.c_long.from_address(id(None)).value = refcountNone + NONE_REF_COUNT_THRESHOLDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very dangerous workaround that directly manipulates the reference count of the None object. This relies on internal CPython implementation details (the memory layout of PyObject), which can break with different Python versions or builds, potentially leading to memory corruption or crashes.
While I understand this might be a necessary evil to prevent a crash due to a reference counting bug that is apparently fixed in Python 3.12, this code block needs more extensive documentation.
Please add a more detailed comment explaining:
- The root cause of the
Nonereference count dropping. - A link to the upstream bug report (in CPython or another library) that describes the issue and its fix in Python 3.12.
- How the threshold
500000was chosen. - Any other context that would help a future developer understand why this high-risk code is necessary and when it can be safely removed.
This context is crucial for maintainability and to prevent accidental breakage in the future.
…ng to zero and causing a crash. Signed-off-by: 1092626063 <[email protected]>
feec38e to
1ce5044
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a workaround to prevent a crash caused by the reference count of the None object dropping to a low value. It does this by directly manipulating the reference count in memory using ctypes if it falls below a certain threshold.
While this may prevent an immediate crash, this approach is extremely dangerous as it relies on CPython's internal implementation details and can lead to instability. My review includes critical feedback on this workaround, suggesting a more portable implementation of the hack and emphasizing the need to find and fix the root cause of the reference counting issue. This workaround should be treated with extreme caution.
| # TODO: remove this after python update to 3.12 | ||
| refcountNone = sys.getrefcount(None) | ||
| if refcountNone < NONE_REF_COUNT_THRESHOLDS: | ||
| ctypes.c_long.from_address(id(None)).value = U32_MAX - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly manipulating the None object's reference count is extremely dangerous and non-portable. It relies on CPython's internal memory layout, which can change between versions.
While this is a workaround for a deeper reference counting bug, it can be made slightly safer and more portable by using ctypes.c_ssize_t instead of ctypes.c_long. The ob_refcnt field of a Python object is of type Py_ssize_t. ctypes.c_long is platform-dependent (e.g., it's 32-bit on 64-bit Windows), whereas ctypes.c_ssize_t correctly corresponds to Py_ssize_t on all platforms.
This change doesn't fix the underlying issue but reduces the risk of platform-specific failures. The root cause of the excessive Py_DECREF on None should still be investigated as a high priority.
| ctypes.c_long.from_address(id(None)).value = U32_MAX - 1 | |
| ctypes.c_ssize_t.from_address(id(None)).value = U32_MAX - 1 |
| SEQ_LEN_WITH_MAX_PA_WORKSPACE = 6144 | ||
|
|
||
| # TODO: remove this after python update to 3.12 | ||
| NONE_REF_COUNT_THRESHOLDS = 500000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value 500000 for NONE_REF_COUNT_THRESHOLDS appears to be a magic number. Given that this is part of a critical workaround to prevent a crash, it's important to document how this threshold was determined. Please add a comment explaining the rationale behind this specific value. This context is crucial for future maintenance and understanding the conditions under which this workaround is triggered.
1a390d3
into
vllm-project:releases/v0.13.0
vllm-project#6441) ### What this PR does / why we need it? work around: reset the None reference count to prevent it from dropping to zero and causing a crash. Signed-off-by: 1092626063 <[email protected]>
What this PR does / why we need it?
work around: reset the None reference count to prevent it from dropping to zero and causing a crash.
Does this PR introduce any user-facing change?
How was this patch tested?