[CI] Add end-to-end V1 min_tokens test coverage #22495

arjunbreddy22 · 2025-08-08T05:28:43Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Add comprehensive end-to-end CI coverage for V1 min_tokens.
Resolves #21950 (verify and add CI coverage).

Includes bug-repro tests:

Stop strings bypassing min_tokens (known bug [Bug]: min_tokens is not respected when stop is triggered early #21987; being fixed in PR [V1] support min_tokens for detokener #22014).
EOS token behavior with min_tokens (potential logits-processor bug; now appears fixed).

Scope:

Tests only; no functional code changes.
New file: tests/v1/e2e/test_min_tokens.py.

Test Plan

All tests target the V1 engine with a small model (facebook/opt-125m) for fast CPU execution.
Deterministic setup via temperature=0.0 and enforce_eager=True.

Run locally:

# From repo root
VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v

Test Result

Latest run (CPU, ~26s):

7 passed
3 xfailed (stop-strings wide and simple stop lists)
3 xpassed (EOS-related and a guaranteed early-trigger case)

Example summary:

PASSED: baseline and edge cases.
XFAIL: stop-strings bypass min_tokens (expected until fully fixed).
XPASS: EOS-related tests and one stop-strings stress test indicate upstream fixes are effective in those paths.

Note:

XFAILs use decorator marks with strict=False, so when upstream fixes land, they will XPASS without breaking CI.

(Optional) Documentation Update

N/A

github-actions · 2025-08-08T05:28:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds a comprehensive set of end-to-end tests for the min_tokens functionality in the V1 engine. The tests are well-structured and cover a wide range of scenarios, including known bugs which are correctly marked as xfail. My review focuses on improving the robustness of the test assertions to prevent crashes on unexpected outputs and enhancing code clarity by removing redundant parameters. I've identified a few critical issues where tests could crash due to IndexError and a high-severity issue with an incorrect command in the documentation.

gemini-code-assist · 2025-08-08T05:30:22Z

tests/v1/e2e/test_min_tokens.py

+def assert_min_tokens_satisfied(
+    output: RequestOutput, 
+    test_case: MinTokensTestCase
+) -> None:
+    """Assert that min_tokens requirement is satisfied"""
+    token_count = get_token_count(output)
+
+    if test_case.expected_exact_len is not None:
+        # Exact length requirement
+        assert token_count == test_case.expected_exact_len, (
+            f"Expected exactly {test_case.expected_exact_len} tokens, "
+            f"got {token_count} tokens. "
+            f"Stop reason: {output.outputs[0].stop_reason}"
+        )
+    else:
+        # Minimum length requirement
+        assert token_count >= test_case.expected_min_len, (
+            f"Expected at least {test_case.expected_min_len} tokens, "
+            f"got {token_count} tokens. "
+            f"Stop reason: {output.outputs[0].stop_reason}"
+        )


The assertion messages at lines 195 and 202 can raise an IndexError if output.outputs is empty, which can happen if the model generates no tokens. This would crash the test instead of providing a clear failure message. You should safely access stop_reason to prevent this.

def assert_min_tokens_satisfied( output: RequestOutput, test_case: MinTokensTestCase ) -> None: """Assert that min_tokens requirement is satisfied""" token_count = get_token_count(output) stop_reason = output.outputs[0].stop_reason if output.outputs else "no output" if test_case.expected_exact_len is not None: # Exact length requirement assert token_count == test_case.expected_exact_len, ( f"Expected exactly {test_case.expected_exact_len} tokens, " f"got {token_count} tokens. " f"Stop reason: {stop_reason}" ) else: # Minimum length requirement assert token_count >= test_case.expected_min_len, ( f"Expected at least {test_case.expected_min_len} tokens, " f"got {token_count} tokens. " f"Stop reason: {stop_reason}" )

gemini-code-assist · 2025-08-08T05:30:22Z

tests/v1/e2e/test_min_tokens.py

+    assert token_count >= 15, (
+        f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "
+        f"Stop reason: {outputs[0].outputs[0].stop_reason}. "
+        f"Generated text: {repr(generated_text)}"
+    )


The f-string for the assertion message at line 318 accesses outputs[0].outputs[0].stop_reason without checking if outputs[0].outputs is empty. If the model generates no tokens, this will cause an IndexError and crash the test. You should access stop_reason safely to ensure the test fails with a meaningful message.

Suggested change

assert token_count >= 15, (

f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "

f"Stop reason: {outputs[0].outputs[0].stop_reason}. "

f"Generated text: {repr(generated_text)}"

)

assert token_count >= 15, (

f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "

f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}. "

f"Generated text: {repr(generated_text)}"

)

gemini-code-assist · 2025-08-08T05:30:23Z

tests/v1/e2e/test_min_tokens.py

+    assert token_count == 25, (
+        f"Expected exactly 25 tokens, got {token_count}. "
+        f"Stop reason: {outputs[0].outputs[0].stop_reason}"
+    )


The f-string for the assertion message at line 396 accesses outputs[0].outputs[0].stop_reason without checking if outputs[0].outputs is empty. If the model generates no tokens, this will cause an IndexError and crash the test. You should access stop_reason safely to ensure the test fails with a meaningful message.

Suggested change

assert token_count == 25, (

f"Expected exactly 25 tokens, got {token_count}. "

f"Stop reason: {outputs[0].outputs[0].stop_reason}"

)

assert token_count == 25, (

f"Expected exactly 25 tokens, got {token_count}. "

f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}"

)

gemini-code-assist · 2025-08-08T05:30:23Z

tests/v1/e2e/test_min_tokens.py

+        should_pass: bool = True,
+        xfail_reason: Optional[str] = None


The should_pass and xfail_reason parameters are unused. The xfail logic is handled by pytest.param and pytest.mark.xfail decorators. These parameters are redundant and should be removed from the MinTokensTestCase class and all its instantiations to improve code clarity and maintainability.

gemini-code-assist · 2025-08-08T05:30:23Z

tests/v1/e2e/test_min_tokens.py

+
+    Usage:
+        cd vllm/
+        VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v


The usage instruction in the docstring has an incorrect file path. The file is located at tests/v1/e2e/test_min_tokens.py, but the command refers to tests/v1/test_min_tokens.py. This will prevent developers from running the tests locally using the provided command.

Suggested change

VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v

VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v

Signed-off-by: Arjun Reddy <[email protected]>

…ill XPASS if fixes work. Signed-off-by: Arjun Reddy <[email protected]>

Signed-off-by: Arjun Reddy <[email protected]>

njhill

Thanks very much @arjunbreddy22!

So it looks like you verified that min_tokens appears to be working in non-stop-sequence case?

Please see my inline comment though, would be good to sanity check that it really is suppressing eos and not just generating 25 tokens anyhow.

njhill · 2025-08-08T18:52:47Z

tests/v1/e2e/test_min_tokens.py

+
+# Test configuration
+TEST_MODEL = "facebook/opt-125m"  # Small model for fast CI execution
+TEMPERATURE = 0.0  # Deterministic generation for consistent testing


suggest to rename this to GREEDY

njhill · 2025-08-08T18:54:27Z

tests/v1/e2e/test_min_tokens.py

+    sampling_params = SamplingParams(
+        min_tokens=25,
+        max_tokens=25,  # Force exact length
+        temperature=TEMPERATURE)
+
+    prompt = "The capital of France is"
+    outputs = llm_v1.generate([prompt], sampling_params)


@arjunbreddy22 could you make calls here without and with min_tokens, check in the "without" case that < 25 tokens are output with finish_reason of stop and stop_reason None. And also in the "with" case verify that none of the generated token ids == the eos token id.

njhill · 2025-08-08T18:58:17Z

cc @vadimkantorov

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 · 2025-08-09T08:05:19Z

Thanks for the review! I’ve updated the test to run both with and without min_tokens, adding checks to ensure that in the without case fewer than 25 tokens are generated with finish_reason set to stop and no stop_reason, and in the with case no EOS token ID appears. I also renamed TEMPERATURE to GREEDY for clarity.

… likelihood of EOS token in case 1 Signed-off-by: Arjun Reddy <[email protected]>

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 · 2025-08-09T10:51:05Z

On your earlier question by the way. Yes, min_tokens is working in the non stop sequence case. Non stop sequence tests pass when I ran it myself (basic_min_tokens_no_stop, min_tokens_zero, min_equals_max_no_stop, large_min_tokens, min_tokens_with_empty_stop_list). Also, for your third comment I've verified that in the "without" case (which I added), < 32 (I changed from 25 to 32) of the tokens are output and the finish_reason was stop. And in the "with" case none of the generated tokens == the eos token id.

njhill

Thanks a lot @arjunbreddy22!

njhill · 2025-08-11T19:59:56Z

@vadimkantorov any idea what's going on here? min_tokens appears to be working ... could it somehow have been fixed since 0.9.2?

vadimkantorov · 2025-08-11T20:15:06Z

Hard to say :( We'll try the very new vllm with Trinity and I'll report back. But anyway, good to have tests for this.

mergify bot added the v1 label Aug 8, 2025

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

arjunbreddy22 added 4 commits August 8, 2025 08:05

Added completed test_min_tokens.py that had working tests to this branch

82cd4f7

Signed-off-by: Arjun Reddy <[email protected]>

Converted unconditional runtime xfails to xfail decorators so tests w…

b3bd5b2

…ill XPASS if fixes work. Signed-off-by: Arjun Reddy <[email protected]>

Update test_min_tokens.py to fix some line spacing changes

3a536d6

Signed-off-by: Arjun Reddy <[email protected]>

Removed redudant parameters and fixed small errors

8d0744f

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from ec67577 to 8d0744f Compare August 8, 2025 08:05

Update test_min_tokens.py to fix remaining E501 errors

0f68ad9

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from a4027e4 to 0f68ad9 Compare August 8, 2025 09:20

Fix issort

92b7e65

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 3b96499 to 92b7e65 Compare August 8, 2025 09:43

style: apply pre-commit formatting (yapf/isort/ruff)

4cb94b8

Signed-off-by: Arjun Reddy <[email protected]>

njhill reviewed Aug 8, 2025

View reviewed changes

Use GREEDY (temperature=0) and harden EOS/min_tokens coverage

260811a

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from b9f31d8 to 260811a Compare August 9, 2025 07:29

Fix small RUFF E501 error

9e0fad3

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from c780444 to 9e0fad3 Compare August 9, 2025 07:42

style: fixed yapf error

51fee13

Signed-off-by: Arjun Reddy <[email protected]>

Change test_min_tokens_eos_behavior prompt and max_tokens to increase…

ce1cfbd

… likelihood of EOS token in case 1 Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 9b02200 to ce1cfbd Compare August 9, 2025 09:43

Prompt generates EOS token in the WIHTOUT min_tokens case.

2a4d474

Signed-off-by: Arjun Reddy <[email protected]>

arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 083236d to 2a4d474 Compare August 9, 2025 10:43

arjunbreddy22 requested a review from njhill August 10, 2025 05:46

njhill approved these changes Aug 11, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 11, 2025

vadimkantorov mentioned this pull request Aug 13, 2025

QueueWrapper.get_batch in trainer hangs: how to debug? modelscope/Trinity-RFT#182

Open

	VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v
	VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v

Uh oh!

[CI] Add end-to-end V1 min_tokens test coverage #22495

Are you sure you want to change the base?

[CI] Add end-to-end V1 min_tokens test coverage #22495

Conversation

arjunbreddy22 commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

njhill commented Aug 8, 2025

Uh oh!

arjunbreddy22 commented Aug 9, 2025

Uh oh!

arjunbreddy22 commented Aug 9, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Aug 11, 2025

Uh oh!

vadimkantorov commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arjunbreddy22 commented Aug 8, 2025 •

edited by github-actions bot

Loading

vadimkantorov commented Aug 11, 2025 •

edited

Loading