-
Notifications
You must be signed in to change notification settings - Fork 789
work around: reset the None reference count to prevent it from droppi… #6441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |||||
| # Adapted from vllm-project/vllm/vllm/worker/gpu_model_runner.py | ||||||
| # | ||||||
|
|
||||||
| import ctypes | ||||||
| import math | ||||||
| import sys | ||||||
| from collections import defaultdict | ||||||
|
|
@@ -131,6 +132,10 @@ | |||||
|
|
||||||
| SEQ_LEN_WITH_MAX_PA_WORKSPACE = 6144 | ||||||
|
|
||||||
| # TODO: remove this after python update to 3.12 | ||||||
| NONE_REF_COUNT_THRESHOLDS = 500000 | ||||||
| U32_MAX = 0xFFFFFFFF | ||||||
|
|
||||||
|
|
||||||
| @dataclass | ||||||
| class GraphCaptureContext: | ||||||
|
|
@@ -1654,6 +1659,11 @@ def sample_tokens( | |||||
| kv_connector_output = self.kv_connector_output | ||||||
| self.kv_connector_output = None | ||||||
|
|
||||||
| # TODO: remove this after python update to 3.12 | ||||||
| refcountNone = sys.getrefcount(None) | ||||||
| if refcountNone < NONE_REF_COUNT_THRESHOLDS: | ||||||
| ctypes.c_long.from_address(id(None)).value = U32_MAX - 1 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Directly manipulating the While this is a workaround for a deeper reference counting bug, it can be made slightly safer and more portable by using This change doesn't fix the underlying issue but reduces the risk of platform-specific failures. The root cause of the excessive
Suggested change
|
||||||
|
|
||||||
| if self.execute_model_state is None: | ||||||
| # Nothing to do (PP non-final rank case), output isn't used. | ||||||
| if not kv_connector_output: | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value
500000forNONE_REF_COUNT_THRESHOLDSappears to be a magic number. Given that this is part of a critical workaround to prevent a crash, it's important to document how this threshold was determined. Please add a comment explaining the rationale behind this specific value. This context is crucial for future maintenance and understanding the conditions under which this workaround is triggered.