-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
[CI] Add end-to-end V1 min_tokens test coverage #22495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CI] Add end-to-end V1 min_tokens test coverage #22495
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a comprehensive set of end-to-end tests for the min_tokens
functionality in the V1 engine. The tests are well-structured and cover a wide range of scenarios, including known bugs which are correctly marked as xfail
. My review focuses on improving the robustness of the test assertions to prevent crashes on unexpected outputs and enhancing code clarity by removing redundant parameters. I've identified a few critical issues where tests could crash due to IndexError
and a high-severity issue with an incorrect command in the documentation.
tests/v1/e2e/test_min_tokens.py
Outdated
def assert_min_tokens_satisfied( | ||
output: RequestOutput, | ||
test_case: MinTokensTestCase | ||
) -> None: | ||
"""Assert that min_tokens requirement is satisfied""" | ||
token_count = get_token_count(output) | ||
|
||
if test_case.expected_exact_len is not None: | ||
# Exact length requirement | ||
assert token_count == test_case.expected_exact_len, ( | ||
f"Expected exactly {test_case.expected_exact_len} tokens, " | ||
f"got {token_count} tokens. " | ||
f"Stop reason: {output.outputs[0].stop_reason}" | ||
) | ||
else: | ||
# Minimum length requirement | ||
assert token_count >= test_case.expected_min_len, ( | ||
f"Expected at least {test_case.expected_min_len} tokens, " | ||
f"got {token_count} tokens. " | ||
f"Stop reason: {output.outputs[0].stop_reason}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion messages at lines 195 and 202 can raise an IndexError
if output.outputs
is empty, which can happen if the model generates no tokens. This would crash the test instead of providing a clear failure message. You should safely access stop_reason
to prevent this.
def assert_min_tokens_satisfied(
output: RequestOutput,
test_case: MinTokensTestCase
) -> None:
"""Assert that min_tokens requirement is satisfied"""
token_count = get_token_count(output)
stop_reason = output.outputs[0].stop_reason if output.outputs else "no output"
if test_case.expected_exact_len is not None:
# Exact length requirement
assert token_count == test_case.expected_exact_len, (
f"Expected exactly {test_case.expected_exact_len} tokens, "
f"got {token_count} tokens. "
f"Stop reason: {stop_reason}"
)
else:
# Minimum length requirement
assert token_count >= test_case.expected_min_len, (
f"Expected at least {test_case.expected_min_len} tokens, "
f"got {token_count} tokens. "
f"Stop reason: {stop_reason}"
)
tests/v1/e2e/test_min_tokens.py
Outdated
assert token_count >= 15, ( | ||
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. " | ||
f"Stop reason: {outputs[0].outputs[0].stop_reason}. " | ||
f"Generated text: {repr(generated_text)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The f-string for the assertion message at line 318 accesses outputs[0].outputs[0].stop_reason
without checking if outputs[0].outputs
is empty. If the model generates no tokens, this will cause an IndexError
and crash the test. You should access stop_reason
safely to ensure the test fails with a meaningful message.
assert token_count >= 15, ( | |
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. " | |
f"Stop reason: {outputs[0].outputs[0].stop_reason}. " | |
f"Generated text: {repr(generated_text)}" | |
) | |
assert token_count >= 15, ( | |
f"Bug confirmed: Generated only {token_count} tokens despite min_tokens=15. " | |
f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}. " | |
f"Generated text: {repr(generated_text)}" | |
) |
tests/v1/e2e/test_min_tokens.py
Outdated
assert token_count == 25, ( | ||
f"Expected exactly 25 tokens, got {token_count}. " | ||
f"Stop reason: {outputs[0].outputs[0].stop_reason}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The f-string for the assertion message at line 396 accesses outputs[0].outputs[0].stop_reason
without checking if outputs[0].outputs
is empty. If the model generates no tokens, this will cause an IndexError
and crash the test. You should access stop_reason
safely to ensure the test fails with a meaningful message.
assert token_count == 25, ( | |
f"Expected exactly 25 tokens, got {token_count}. " | |
f"Stop reason: {outputs[0].outputs[0].stop_reason}" | |
) | |
assert token_count == 25, ( | |
f"Expected exactly 25 tokens, got {token_count}. " | |
f"Stop reason: {outputs[0].outputs[0].stop_reason if outputs[0].outputs else 'no output'}" | |
) |
tests/v1/e2e/test_min_tokens.py
Outdated
should_pass: bool = True, | ||
xfail_reason: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/v1/e2e/test_min_tokens.py
Outdated
|
||
Usage: | ||
cd vllm/ | ||
VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage instruction in the docstring has an incorrect file path. The file is located at tests/v1/e2e/test_min_tokens.py
, but the command refers to tests/v1/test_min_tokens.py
. This will prevent developers from running the tests locally using the provided command.
VLLM_USE_V1=1 python -m pytest tests/v1/test_min_tokens.py -v | |
VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v |
Signed-off-by: Arjun Reddy <[email protected]>
…ill XPASS if fixes work. Signed-off-by: Arjun Reddy <[email protected]>
Signed-off-by: Arjun Reddy <[email protected]>
Signed-off-by: Arjun Reddy <[email protected]>
ec67577
to
8d0744f
Compare
Signed-off-by: Arjun Reddy <[email protected]>
a4027e4
to
0f68ad9
Compare
Signed-off-by: Arjun Reddy <[email protected]>
3b96499
to
92b7e65
Compare
Signed-off-by: Arjun Reddy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @arjunbreddy22!
So it looks like you verified that min_tokens
appears to be working in non-stop-sequence case?
Please see my inline comment though, would be good to sanity check that it really is suppressing eos and not just generating 25 tokens anyhow.
tests/v1/e2e/test_min_tokens.py
Outdated
|
||
# Test configuration | ||
TEST_MODEL = "facebook/opt-125m" # Small model for fast CI execution | ||
TEMPERATURE = 0.0 # Deterministic generation for consistent testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to rename this to GREEDY
tests/v1/e2e/test_min_tokens.py
Outdated
sampling_params = SamplingParams( | ||
min_tokens=25, | ||
max_tokens=25, # Force exact length | ||
temperature=TEMPERATURE) | ||
|
||
prompt = "The capital of France is" | ||
outputs = llm_v1.generate([prompt], sampling_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjunbreddy22 could you make calls here without and with min_tokens
, check in the "without" case that < 25 tokens are output with finish_reason of stop
and stop_reason None
. And also in the "with" case verify that none of the generated token ids == the eos token id.
Signed-off-by: Arjun Reddy <[email protected]>
b9f31d8
to
260811a
Compare
Signed-off-by: Arjun Reddy <[email protected]>
c780444
to
9e0fad3
Compare
Signed-off-by: Arjun Reddy <[email protected]>
Thanks for the review! I’ve updated the test to run both with and without min_tokens, adding checks to ensure that in the without case fewer than 25 tokens are generated with finish_reason set to stop and no stop_reason, and in the with case no EOS token ID appears. I also renamed TEMPERATURE to GREEDY for clarity. |
… likelihood of EOS token in case 1 Signed-off-by: Arjun Reddy <[email protected]>
9b02200
to
ce1cfbd
Compare
Signed-off-by: Arjun Reddy <[email protected]>
083236d
to
2a4d474
Compare
On your earlier question by the way. Yes, min_tokens is working in the non stop sequence case. Non stop sequence tests pass when I ran it myself (basic_min_tokens_no_stop, min_tokens_zero, min_equals_max_no_stop, large_min_tokens, min_tokens_with_empty_stop_list). Also, for your third comment I've verified that in the "without" case (which I added), < 32 (I changed from 25 to 32) of the tokens are output and the finish_reason was stop. And in the "with" case none of the generated tokens == the eos token id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @arjunbreddy22!
@vadimkantorov any idea what's going on here? |
Hard to say :( We'll try the very new vllm with Trinity and I'll report back. But anyway, good to have tests for this. |
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Add comprehensive end-to-end CI coverage for V1
min_tokens
.Resolves #21950 (verify and add CI coverage).
Includes bug-repro tests:
min_tokens
(known bug [Bug]: min_tokens is not respected when stop is triggered early #21987; being fixed in PR [V1] support min_tokens for detokener #22014).min_tokens
(potential logits-processor bug; now appears fixed).Scope:
tests/v1/e2e/test_min_tokens.py
.Test Plan
facebook/opt-125m
) for fast CPU execution.temperature=0.0
andenforce_eager=True
.Run locally:
# From repo root VLLM_USE_V1=1 python -m pytest tests/v1/e2e/test_min_tokens.py -v
Test Result
Latest run (CPU, ~26s):
Example summary:
min_tokens
(expected until fully fixed).Note:
strict=False
, so when upstream fixes land, they will XPASS without breaking CI.(Optional) Documentation Update
N/A