-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a critical bug that occurred when aggregating KV connector outputs, where KVConnectorOutput
could be None
, leading to an AttributeError
. The fix correctly adds a None
check before accessing attributes of kv_connector_output
, making the aggregation logic more robust. The change is correct and effectively resolves the issue. I've added one comment suggesting the addition of a unit test to cover this new edge case and prevent future regressions.
kv_connector_output = output.kv_connector_output | ||
if kv_connector_output: | ||
update_finished_set(kv_connector_output.finished_sending, | ||
self._send_remaining_count, finished_sending) | ||
update_finished_set(kv_connector_output.finished_recving, | ||
self._recv_remaining_count, finished_recving) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change correctly fixes a critical bug where kv_connector_output
could be None
, causing a crash. However, to prevent future regressions, it's important to add a unit test that covers this specific edge case.
The existing tests in tests/v1/kv_connector/unit/test_output_aggreagator.py
do not seem to cover the scenario where kv_connector_output
is None
. Please consider adding a new test case that asserts the correct behavior when one or more of the ModelRunnerOutput
objects in the outputs
list has kv_connector_output=None
.
Thanks for the catch, @liuzijing2014! My original intent was that the KV connector path would always produce a Instead of allowing Also, note that a similar issue exists in |
@sdavidbd Make sense to me, let me create a new empty default then. |
2852f90
to
0d8adbe
Compare
Signed-off-by: Zijing Liu <[email protected]>
0d8adbe
to
d0e73fd
Compare
Maybe it would be simpler / less fragile to just add a check in the aggregator here:
if not output:
continue |
Good point, @njhill - adding a check in the aggregator would certainly be simpler and less fragile in the short term. Either way works — it’s just a trade-off between a quick fix and enforcing that invariant in the data model. |
What's the point keeping |
How about replacing |
Thanks, @njhill . I see where you’re coming from, but I’m a bit hesitant about adding those checks — my concern is they might end up masking logic bugs rather than helping us catch them early. My preference would be to keep the stronger invariant in both directions: on the KV-connector path we always yield a On the allocation concern — with the new KV-connector context manager added in #21980 we already create an empty The change in this PR targets the no‑forward path ( I could be missing something, but I think keeping the invariant will make it easier to spot real issues and avoid unnecessary |
slved by #22663 |
Purpose
From #21980, we refactor the KV connector path but we miss an edge case where KV connector loads kv cache async, and for some model iteration, there is no kv connector output at all.
vllm/vllm/v1/worker/kv_connector_model_runner_mixin.py
Lines 69 to 71 in 7e3a8dc
vllm/vllm/v1/outputs.py
Lines 120 to 127 in 7e3a8dc
Test
Before
After
P/D disagg could run properly
Eval
cc @sdavidbd @njhill @houseroad