-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
vllm fix check on max vocab size #22471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request was exported from Phabricator. Differential Revision: D79840114 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request aims to fix a vocabulary size check to account for differences between the tokenizer's vocabulary and the model's vocabulary. While the intent is correct, the implementation introduces an off-by-one error by comparing a token ID with the vocabulary size directly, instead of the maximum valid token ID (vocab_size - 1
). This could lead to out-of-bounds errors. I've provided a suggestion to correct this.
vllm/v1/engine/processor.py
Outdated
@@ -394,7 +394,7 @@ def _validate_model_input( | |||
else: | |||
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request) | |||
max_input_id = max(prompt_ids, default=0) | |||
if max_input_id > tokenizer.max_token_id: | |||
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an off-by-one error in the vocabulary size check. self.model_config.get_vocab_size()
returns the size of the vocabulary (e.g., 151936), so the maximum valid token ID is vocab_size - 1
(e.g., 151935).
The current check max_input_id > self.model_config.get_vocab_size()
would incorrectly allow a max_input_id
equal to vocab_size
, which is out of bounds.
The condition should compare against self.model_config.get_vocab_size() - 1
.
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()): | |
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size() - 1): |
vllm/v1/engine/processor.py
Outdated
@@ -394,7 +394,7 @@ def _validate_model_input( | |||
else: | |||
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request) | |||
max_input_id = max(prompt_ids, default=0) | |||
if max_input_id > tokenizer.max_token_id: | |||
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps no need max? model config vocab size is the source of truth.
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()): | |
if max_input_id > self.model_config.get_vocab_size() - 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we observe that we have such ids generated in Qwen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have such ID generated by Qwen which is out of the tokenizer's vocabulary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that mean then? Just curious how the model / engine handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanation here: #13175 (comment)
and here why vocab size is 151936(%128==0): QwenLM/Qwen3#147 (comment)
Summary: the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is `"vocab_size": 151936`. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''. It's probably still valid to send the ids in between, because the model can legitimately produce such token id. Test Plan: Send 151860 and it's passing. Send 152860 and it complained about invalid token. Rollback Plan: Reviewed By: tensormeta Differential Revision: D79840114
5089ff3
to
e713e9b
Compare
This pull request was exported from Phabricator. Differential Revision: D79840114 |
Summary:
the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is
"vocab_size": 151936
. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''.It's probably still valid to send the ids in between, because the model can legitimately produce such token id.
Test Plan:
Send 151860 and it's passing. Send 152860 and it complained about invalid token.
Rollback Plan:
Differential Revision: D79840114