Skip to content

[CI] Add end-to-end V1 min_tokens test coverage #22495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

arjunbreddy22
Copy link

@arjunbreddy22 arjunbreddy22 commented Aug 8, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Add comprehensive end-to-end CI coverage for V1 min_tokens.
Resolves #21950 (verify and add CI coverage).

Includes bug-repro tests:

Scope:

  • Tests only; no functional code changes.
  • New file: tests/v1/e2e/test_min_tokens.py.

Test Plan

  • All tests target the V1 engine with a small model (facebook/opt-125m) for fast CPU execution.
  • Deterministic setup via temperature=0.0 and enforce_eager=True.

Run locally:

# From repo root
VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v 

Test Result

Latest run (CPU, ~26s):

  • 7 passed
  • 3 xfailed (stop-strings wide and simple stop lists)
  • 3 xpassed (EOS-related and a guaranteed early-trigger case)

Example summary:

  • PASSED: baseline and edge cases.
  • XFAIL: stop-strings bypass min_tokens (expected until fully fixed).
  • XPASS: EOS-related tests and one stop-strings stress test indicate upstream fixes are effective in those paths.

Note:

  • XFAILs use decorator marks with strict=False, so when upstream fixes land, they will XPASS without breaking CI.

(Optional) Documentation Update

N/A

Copy link

github-actions bot commented Aug 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Aug 8, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive set of end-to-end tests for the min_tokens functionality in the V1 engine. The tests are well-structured and cover a wide range of scenarios, including known bugs which are correctly marked as xfail. My review focuses on improving the robustness of the test assertions to prevent crashes on unexpected outputs and enhancing code clarity by removing redundant parameters. I've identified a few critical issues where tests could crash due to IndexError and a high-severity issue with an incorrect command in the documentation.

Comment on lines 183 to 201
def assert_min_tokens_satisfied(
output: RequestOutput,
test_case: MinTokensTestCase
) -> None:
"""Assert that min_tokens requirement is satisfied"""
token_count = get_token_count(output)

if test_case.expected_exact_len is not None:
# Exact length requirement
assert token_count == test_case.expected_exact_len, (
f"Expected exactly {test_case.expected_exact_len} tokens, "
f"got {token_count} tokens. "
f"Stop reason: {output.outputs[0].stop_reason}"
)
else:
# Minimum length requirement
assert token_count >= test_case.expected_min_len, (
f"Expected at least {test_case.expected_min_len} tokens, "
f"got {token_count} tokens. "
f"Stop reason: {output.outputs[0].stop_reason}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The assertion messages at lines 195 and 202 can raise an IndexError if output.outputs is empty, which can happen if the model generates no tokens. This would crash the test instead of providing a clear failure message. You should safely access stop_reason to prevent this.

def assert_min_tokens_satisfied(
    output: RequestOutput, 
    test_case: MinTokensTestCase
) -> None:
    """Assert that min_tokens requirement is satisfied"""
    token_count = get_token_count(output)
    stop_reason = output.outputs[0].stop_reason if output.outputs else "no output"
    
    if test_case.expected_exact_len is not None:
        # Exact length requirement
        assert token_count == test_case.expected_exact_len, (
            f"Expected exactly {test_case.expected_exact_len} tokens, "
            f"got {token_count} tokens. "
            f"Stop reason: {stop_reason}"
        )
    else:
        # Minimum length requirement
        assert token_count >= test_case.expected_min_len, (
            f"Expected at least {test_case.expected_min_len} tokens, "
            f"got {token_count} tokens. "
            f"Stop reason: {stop_reason}"
        )

Comment on lines 316 to 318
assert token_count >= 15, (
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "
f"Stop reason: {outputs[0].outputs[0].stop_reason}. "
f"Generated text: {repr(generated_text)}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The f-string for the assertion message at line 318 accesses outputs[0].outputs[0].stop_reason without checking if outputs[0].outputs is empty. If the model generates no tokens, this will cause an IndexError and crash the test. You should access stop_reason safely to ensure the test fails with a meaningful message.

Suggested change
assert token_count >= 15, (
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "
f"Stop reason: {outputs[0].outputs[0].stop_reason}. "
f"Generated text: {repr(generated_text)}"
)
assert token_count >= 15, (
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. "
f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}. "
f"Generated text: {repr(generated_text)}"
)

Comment on lines 394 to 395
assert token_count == 25, (
f"Expected exactly 25 tokens, got {token_count}. "
f"Stop reason: {outputs[0].outputs[0].stop_reason}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The f-string for the assertion message at line 396 accesses outputs[0].outputs[0].stop_reason without checking if outputs[0].outputs is empty. If the model generates no tokens, this will cause an IndexError and crash the test. You should access stop_reason safely to ensure the test fails with a meaningful message.

Suggested change
assert token_count == 25, (
f"Expected exactly 25 tokens, got {token_count}. "
f"Stop reason: {outputs[0].outputs[0].stop_reason}"
)
assert token_count == 25, (
f"Expected exactly 25 tokens, got {token_count}. "
f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}"
)

Comment on lines 52 to 53
should_pass: bool = True,
xfail_reason: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The should_pass and xfail_reason parameters are unused. The xfail logic is handled by pytest.param and pytest.mark.xfail decorators. These parameters are redundant and should be removed from the MinTokensTestCase class and all its instantiations to improve code clarity and maintainability.


Usage:
cd vllm/
VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The usage instruction in the docstring has an incorrect file path. The file is located at tests/v1/e2e/test_min_tokens.py, but the command refers to tests/v1/test_min_tokens.py. This will prevent developers from running the tests locally using the provided command.

Suggested change
VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v
VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v

@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from ec67577 to 8d0744f Compare August 8, 2025 08:05
@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from a4027e4 to 0f68ad9 Compare August 8, 2025 09:20
Signed-off-by: Arjun Reddy <[email protected]>
@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 3b96499 to 92b7e65 Compare August 8, 2025 09:43
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @arjunbreddy22!

So it looks like you verified that min_tokens appears to be working in non-stop-sequence case?

Please see my inline comment though, would be good to sanity check that it really is suppressing eos and not just generating 25 tokens anyhow.


# Test configuration
TEST_MODEL = "facebook/opt-125m" # Small model for fast CI execution
TEMPERATURE = 0.0 # Deterministic generation for consistent testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to rename this to GREEDY

Comment on lines 383 to 389
sampling_params = SamplingParams(
min_tokens=25,
max_tokens=25, # Force exact length
temperature=TEMPERATURE)

prompt = "The capital of France is"
outputs = llm_v1.generate([prompt], sampling_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunbreddy22 could you make calls here without and with min_tokens, check in the "without" case that < 25 tokens are output with finish_reason of stop and stop_reason None. And also in the "with" case verify that none of the generated token ids == the eos token id.

@njhill
Copy link
Member

njhill commented Aug 8, 2025

cc @vadimkantorov

@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from b9f31d8 to 260811a Compare August 9, 2025 07:29
Signed-off-by: Arjun Reddy <[email protected]>
@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from c780444 to 9e0fad3 Compare August 9, 2025 07:42
Signed-off-by: Arjun Reddy <[email protected]>
@arjunbreddy22
Copy link
Author

Thanks for the review! I’ve updated the test to run both with and without min_tokens, adding checks to ensure that in the without case fewer than 25 tokens are generated with finish_reason set to stop and no stop_reason, and in the with case no EOS token ID appears. I also renamed TEMPERATURE to GREEDY for clarity.

… likelihood of EOS token in case 1

Signed-off-by: Arjun Reddy <[email protected]>
@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 9b02200 to ce1cfbd Compare August 9, 2025 09:43
@arjunbreddy22 arjunbreddy22 force-pushed the fix/ci-min-tokens-test branch from 083236d to 2a4d474 Compare August 9, 2025 10:43
@arjunbreddy22
Copy link
Author

On your earlier question by the way. Yes, min_tokens is working in the non stop sequence case. Non stop sequence tests pass when I ran it myself (basic_min_tokens_no_stop, min_tokens_zero, min_equals_max_no_stop, large_min_tokens, min_tokens_with_empty_stop_list). Also, for your third comment I've verified that in the "without" case (which I added), < 32 (I changed from 25 to 32) of the tokens are output and the finish_reason was stop. And in the "with" case none of the generated tokens == the eos token id.

@arjunbreddy22 arjunbreddy22 requested a review from njhill August 10, 2025 05:46
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @arjunbreddy22!

@njhill
Copy link
Member

njhill commented Aug 11, 2025

@vadimkantorov any idea what's going on here? min_tokens appears to be working ... could it somehow have been fixed since 0.9.2?

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 11, 2025
@vadimkantorov
Copy link

vadimkantorov commented Aug 11, 2025

Hard to say :( We'll try the very new vllm with Trinity and I'll report back. But anyway, good to have tests for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Verify that the min_tokens sampling parameter is working and covered by CI tests
3 participants