[Core] Free KV cache GPU memory on engine shutdown #28953

markmc · 2025-11-18T17:28:23Z

Related to #24885

Addresses the enable_multiprocessing=False TODO in tests/v1/shutdown/test_delete.py::test_llm_delete

The trickiest part is the single-process case ("inproc" engine and "uniproc" executor") where we can't rely on shutting down child processes to release GPU memory. To address that, we:

Call engine_core.shutdown() from LLMEngine.__del__()
Free KV cache GPU memory in GPUWorker.shutdown()

Other changes include:

Avoid pytest timeout when waiting for GPU cleanup - use the wait_for_gpu_memory_to_clear() timeout parameter to get a nice Memory of devices not free error
Print memory usage at start of shutdown tests

To allow using check_gpu_memory_usage() at the start of a test. Signed-off-by: Mark McLoughlin <[email protected]>

Signed-off-by: Mark McLoughlin <[email protected]>

Rather than failing with: ``` Failed: Timeout (>120.0s) from pytest-timeout. ``` fail with this instead: ``` ValueError: Memory of devices devices=[0] not free after dur_s=120.00 (threshold='2.0 GiB') ``` Signed-off-by: Mark McLoughlin <[email protected]>

Fixes the shutdown test in the single-process case. Start of test: ``` gpu memory used/total (GiB): 0=0.86/80.00; ``` end of test: ``` gpu memory used/total (GiB): 0=1.41/80.00 ``` Signed-off-by: Mark McLoughlin <[email protected]>

gemini-code-assist

Code Review

This pull request effectively addresses the GPU memory leak on engine shutdown, especially for the single-process case, by introducing an explicit cleanup path. The refactoring in tests/utils.py to extract check_gpu_memory_usage is a nice improvement for test clarity.

I have two main points of feedback regarding the robustness of the shutdown mechanism:

The reliance on __del__ for cleanup in LLMEngine can be unreliable in complex applications with reference cycles.
The shutdown method in GPUWorker is now less defensive, which could lead to errors during shutdown if a worker failed to initialize completely.

Details are in the line comments.

vllm/v1/engine/llm_engine.py

vllm/v1/worker/gpu_worker.py

markmc added 4 commits November 18, 2025 12:09

[Core] Refactor wait_for_gpu_memory_to_clear() test util

ae3ccfd

To allow using check_gpu_memory_usage() at the start of a test. Signed-off-by: Mark McLoughlin <[email protected]>

[Core] Print memory usage at start of shutdown tests

6555d8e

Signed-off-by: Mark McLoughlin <[email protected]>

[Core] Free KV cache GPU memory on engine shutdown

950ff69

Fixes the shutdown test in the single-process case. Start of test: ``` gpu memory used/total (GiB): 0=0.86/80.00; ``` end of test: ``` gpu memory used/total (GiB): 0=1.41/80.00 ``` Signed-off-by: Mark McLoughlin <[email protected]>

mergify bot added the v1 label Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/engine/llm_engine.py Show resolved Hide resolved

vllm/v1/worker/gpu_worker.py Show resolved Hide resolved

markmc requested a review from njhill November 18, 2025 19:36

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Free KV cache GPU memory on engine shutdown #28953

[Core] Free KV cache GPU memory on engine shutdown #28953

markmc commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Core] Free KV cache GPU memory on engine shutdown #28953

Are you sure you want to change the base?

[Core] Free KV cache GPU memory on engine shutdown #28953

Conversation

markmc commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

markmc commented Nov 18, 2025 •

edited by github-actions bot

Loading