Skip to content

Conversation

@markmc
Copy link
Member

@markmc markmc commented Nov 18, 2025

Related to #24885

Addresses the enable_multiprocessing=False TODO in tests/v1/shutdown/test_delete.py::test_llm_delete

The trickiest part is the single-process case ("inproc" engine and "uniproc" executor") where we can't rely on shutting down child processes to release GPU memory. To address that, we:

  1. Call engine_core.shutdown() from LLMEngine.__del__()
  2. Free KV cache GPU memory in GPUWorker.shutdown()

Other changes include:

  1. Avoid pytest timeout when waiting for GPU cleanup - use the wait_for_gpu_memory_to_clear() timeout parameter to get a nice Memory of devices not free error
  2. Print memory usage at start of shutdown tests

To allow using check_gpu_memory_usage() at the start of a test.

Signed-off-by: Mark McLoughlin <[email protected]>
Rather than failing with:

```
Failed: Timeout (>120.0s) from pytest-timeout.
```

fail with this instead:

```
ValueError: Memory of devices devices=[0] not free after dur_s=120.00 (threshold='2.0 GiB')
```

Signed-off-by: Mark McLoughlin <[email protected]>
Fixes the shutdown test in the single-process case.

Start of test:

```
gpu memory used/total (GiB): 0=0.86/80.00;
```

end of test:

```
gpu memory used/total (GiB): 0=1.41/80.00
```

Signed-off-by: Mark McLoughlin <[email protected]>
@mergify mergify bot added the v1 label Nov 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the GPU memory leak on engine shutdown, especially for the single-process case, by introducing an explicit cleanup path. The refactoring in tests/utils.py to extract check_gpu_memory_usage is a nice improvement for test clarity.

I have two main points of feedback regarding the robustness of the shutdown mechanism:

  1. The reliance on __del__ for cleanup in LLMEngine can be unreliable in complex applications with reference cycles.
  2. The shutdown method in GPUWorker is now less defensive, which could lead to errors during shutdown if a worker failed to initialize completely.

Details are in the line comments.

@markmc markmc requested a review from njhill November 18, 2025 19:36
@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant