[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

liuzijing2014 · 2025-08-07T21:55:13Z

Purpose

From #21980, we refactor the KV connector path but we miss an edge case where KV connector loads kv cache async, and for some model iteration, there is no kv connector output at all.

vllm/vllm/v1/worker/kv_connector_model_runner_mixin.py

Lines 69 to 71 in 7e3a8dc

    
           if (not kv_connector_output.finished_sending 
        
                   and not kv_connector_output.finished_recving): 
        
               return EMPTY_MODEL_RUNNER_OUTPUT

vllm/vllm/v1/outputs.py

Lines 120 to 127 in 7e3a8dc

    
           EMPTY_MODEL_RUNNER_OUTPUT = ModelRunnerOutput(req_ids=[], 
        
                                                         req_id_to_index={}, 
        
                                                         sampled_token_ids=[], 
        
                                                         spec_token_ids=None, 
        
                                                         logprobs=None, 
        
                                                         prompt_logprobs_dict={}, 
        
                                                         pooler_output=[], 
        
                                                         num_nans_in_logits=None)

Test

Before

(EngineCore_0 pid=2913251) Traceback (most recent call last):
(EngineCore_0 pid=2913251)   File "/usr/local/fbcode/platform010/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=2913251)     self.run()
(EngineCore_0 pid=2913251)   File "/usr/local/fbcode/platform010/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=2913251)     self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 687, in run_engine_core
(EngineCore_0 pid=2913251)     raise e
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 676, in run_engine_core
(EngineCore_0 pid=2913251)     engine_core.run_busy_loop()
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 703, in run_busy_loop
(EngineCore_0 pid=2913251)     self._process_engine_step()
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 728, in _process_engine_step
(EngineCore_0 pid=2913251)     outputs, model_executed = self.step_fn()
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 273, in step
(EngineCore_0 pid=2913251)     model_output = self.execute_model_with_error_logging(
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 259, in execute_model_with_error_logging
(EngineCore_0 pid=2913251)     raise err
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/engine/core.py", line 250, in execute_model_with_error_logging
(EngineCore_0 pid=2913251)     return model_fn(scheduler_output)
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/v1/executor/multiproc_executor.py", line 192, in execute_model
(EngineCore_0 pid=2913251)     return self.kv_output_aggregator.aggregate(outputs, self.output_rank)
(EngineCore_0 pid=2913251)   File "/data/users/zijingliu/fbsource/buck-out/v2/gen/fbcode/603862b2f935f1b2/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/distributed/kv_transfer/kv_connector/utils.py", line 146, in aggregate
(EngineCore_0 pid=2913251)     update_finished_set(output.finished_sending,
(EngineCore_0 pid=2913251) AttributeError: 'NoneType' object has no attribute 'finished_sending'

After
P/D disagg could run properly

Eval

[2025-08-07 14:50:41,290] [rank 0] [INFO] Per prompt detailed info dumped to /tmp/eval_dump.gsm8k.8_shot.1_gen.20250807_145041.json
[2025-08-07 14:50:41,290] [rank 0] [INFO] Evaluation results on task gsm8k.8_shot.1_gen: em: 0.948000 | f1: 0.948000 | em_maj1@1: 0.948000 | f1_maj1@1: 0.948000
[2025-08-07 14:50:41,290] [rank 0] [INFO] Task gsm8k.8_shot.1_gen took 189.51 seconds

cc @sdavidbd @njhill @houseroad

github-actions · 2025-08-07T21:55:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request fixes a critical bug that occurred when aggregating KV connector outputs, where KVConnectorOutput could be None, leading to an AttributeError. The fix correctly adds a None check before accessing attributes of kv_connector_output, making the aggregation logic more robust. The change is correct and effectively resolves the issue. I've added one comment suggesting the addition of a unit test to cover this new edge case and prevent future regressions.

gemini-code-assist · 2025-08-07T21:56:40Z

vllm/distributed/kv_transfer/kv_connector/utils.py

+            kv_connector_output = output.kv_connector_output
+            if kv_connector_output:
+                update_finished_set(kv_connector_output.finished_sending,
+                                    self._send_remaining_count, finished_sending)
+                update_finished_set(kv_connector_output.finished_recving,
+                                    self._recv_remaining_count, finished_recving)


This change correctly fixes a critical bug where kv_connector_output could be None, causing a crash. However, to prevent future regressions, it's important to add a unit test that covers this specific edge case.

The existing tests in tests/v1/kv_connector/unit/test_output_aggreagator.py do not seem to cover the scenario where kv_connector_output is None. Please consider adding a new test case that asserts the correct behavior when one or more of the ModelRunnerOutput objects in the outputs list has kv_connector_output=None.

sdavidbd · 2025-08-07T22:24:12Z

Thanks for the catch, @liuzijing2014! My original intent was that the KV connector path would always produce a KVConnectorOutput, so this case wasn’t expected.

Instead of allowing None, would it make sense to define a default EMPTY_MODEL_RUNNER_WITH_KVC_OUTPUT for such scenarios?

Also, note that a similar issue exists in gpu_worker.py for PP:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_worker.py#L380

liuzijing2014 · 2025-08-07T22:36:05Z

@sdavidbd Make sense to me, let me create a new empty default then.

Signed-off-by: Zijing Liu <[email protected]>

njhill · 2025-08-08T20:08:52Z

Maybe it would be simpler / less fragile to just add a check in the aggregator here:

vllm/vllm/distributed/kv_transfer/kv_connector/utils.py

Line 145 in e290594

output = output.kv_connector_output

?

if not output:
    continue

sdavidbd · 2025-08-09T20:47:12Z

Maybe it would be simpler / less fragile to just add a check in the aggregator here:

vllm/vllm/distributed/kv_transfer/kv_connector/utils.py

Line 145 in e290594

output = output.kv_connector_output

?
if not output:
    continue

Good point, @njhill - adding a check in the aggregator would certainly be simpler and less fragile in the short term.
My thinking was that having an EMPTY_MODEL_RUNNER_WITH_KVC_OUTPUT would keep the invariant that the KV connector path always returns a KVConnectorOutput, which might make the downstream logic cleaner and avoid scattered None checks.

Either way works — it’s just a trade-off between a quick fix and enforcing that invariant in the data model.

CaveNightingale · 2025-08-11T00:45:36Z

What's the point keeping EMPTY_MODEL_RUNNER_OUTPUT?
It seems the only usages of this variable are {TPUModelRunner, GPUModelRunner}::execute_model and I think they can also be replaced with the new EMPTY_MODEL_RUNNER_WITH_KVC_OUTPUT.
TPUModelRunner::execute_model GPUModelRunner::execute_model

liuzijing2014 · 2025-08-11T17:55:43Z

How about replacing EMPTY_MODEL_RUNNER_OUTPUT withEMPTY_MODEL_RUNNER_WITH_KVC_OUTPUT and make this the new convention for emtpy default?

njhill · 2025-08-11T19:06:48Z

We have another PR here with the simpler fix amongst others.

@sdavidbd I understand the invariant idea but it also means we're often allocating an empty KVConnectorOutput unnecessarily. And I guess it doesn't harm to be defensive with the checks.

sdavidbd · 2025-08-12T09:08:25Z

Thanks, @njhill . I see where you’re coming from, but I’m a bit hesitant about adding those checks — my concern is they might end up masking logic bugs rather than helping us catch them early.

My preference would be to keep the stronger invariant in both directions: on the KV-connector path we always yield a KVConnectorOutput, and off that path we never do (connector is not None ⇔ kv_connector_output is not None).

On the allocation concern — with the new KV-connector context manager added in #21980 we already create an empty KVConnectorOutput, but I believe the cost is negligible in both CPU and memory terms since it’s just a small dataclass with Optional fields defaulting to None.

The change in this PR targets the no‑forward path (kv_connector_no_forward). There, we avoid creating a full ModelRunnerOutput and only copy EMPTY_MODEL_RUNNER_OUTPUT if the KVConnectorOutput is non‑empty. If it’s empty, we return EMPTY_MODEL_RUNNER_WITH_KVC_OUTPUT — a constant “empty” ModelRunnerOutput that still carries an empty KVConnectorOutput.

I could be missing something, but I think keeping the invariant will make it easier to spot real issues and avoid unnecessary None checks scattered through the code.

robertgshaw2-redhat · 2025-08-12T13:27:54Z

slved by #22663

liuzijing2014 marked this pull request as ready for review August 7, 2025 21:55

liuzijing2014 mentioned this pull request Aug 7, 2025

[V1] [P/D] Refactor KV Connector Path #21980

Merged

4 tasks

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

liuzijing2014 force-pushed the fix-kv-connector branch from 2852f90 to 0d8adbe Compare August 7, 2025 23:05

liuzijing2014 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners August 7, 2025 23:05

mergify bot added the v1 label Aug 7, 2025

fix none type error

d0e73fd

Signed-off-by: Zijing Liu <[email protected]>

liuzijing2014 force-pushed the fix-kv-connector branch from 0d8adbe to d0e73fd Compare August 8, 2025 01:43

sdavidbd mentioned this pull request Aug 11, 2025

[BugFix] Fix KVConnectorOutput TPU breakage #22598

Merged

robertgshaw2-redhat closed this Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

liuzijing2014 commented Aug 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

sdavidbd commented Aug 7, 2025 •

edited

Loading

Uh oh!

liuzijing2014 commented Aug 7, 2025

Uh oh!

njhill commented Aug 8, 2025

Uh oh!

sdavidbd commented Aug 9, 2025

Uh oh!

CaveNightingale commented Aug 11, 2025 •

edited

Loading

Uh oh!

liuzijing2014 commented Aug 11, 2025

Uh oh!

njhill commented Aug 11, 2025

Uh oh!

sdavidbd commented Aug 12, 2025

Uh oh!

robertgshaw2-redhat commented Aug 12, 2025

Uh oh!

Uh oh!

	if (not kv_connector_output.finished_sending
	and not kv_connector_output.finished_recving):
	return EMPTY_MODEL_RUNNER_OUTPUT

	EMPTY_MODEL_RUNNER_OUTPUT = ModelRunnerOutput(req_ids=[],
	req_id_to_index={},
	sampled_token_ids=[],
	spec_token_ids=None,
	logprobs=None,
	prompt_logprobs_dict={},
	pooler_output=[],
	num_nans_in_logits=None)

Uh oh!

[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

Conversation

liuzijing2014 commented Aug 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

sdavidbd commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liuzijing2014 commented Aug 7, 2025

Uh oh!

njhill commented Aug 8, 2025

Uh oh!

sdavidbd commented Aug 9, 2025

Uh oh!

CaveNightingale commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liuzijing2014 commented Aug 11, 2025

Uh oh!

njhill commented Aug 11, 2025

Uh oh!

sdavidbd commented Aug 12, 2025

Uh oh!

robertgshaw2-redhat commented Aug 12, 2025

Uh oh!

Uh oh!

liuzijing2014 commented Aug 7, 2025 •

edited by github-actions bot

Loading

sdavidbd commented Aug 7, 2025 •

edited

Loading

CaveNightingale commented Aug 11, 2025 •

edited

Loading