Skip to content

vllm fix check on max vocab size #22471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xw285cornell
Copy link
Contributor

Summary:
the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is "vocab_size": 151936. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''.

It's probably still valid to send the ids in between, because the model can legitimately produce such token id.

Test Plan:
Send 151860 and it's passing. Send 152860 and it complained about invalid token.

Rollback Plan:

Differential Revision: D79840114

Copy link

github-actions bot commented Aug 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D79840114

@mergify mergify bot added the v1 label Aug 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request aims to fix a vocabulary size check to account for differences between the tokenizer's vocabulary and the model's vocabulary. While the intent is correct, the implementation introduces an off-by-one error by comparing a token ID with the vocabulary size directly, instead of the maximum valid token ID (vocab_size - 1). This could lead to out-of-bounds errors. I've provided a suggestion to correct this.

@@ -394,7 +394,7 @@ def _validate_model_input(
else:
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
max_input_id = max(prompt_ids, default=0)
if max_input_id > tokenizer.max_token_id:
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's an off-by-one error in the vocabulary size check. self.model_config.get_vocab_size() returns the size of the vocabulary (e.g., 151936), so the maximum valid token ID is vocab_size - 1 (e.g., 151935).

The current check max_input_id > self.model_config.get_vocab_size() would incorrectly allow a max_input_id equal to vocab_size, which is out of bounds.

The condition should compare against self.model_config.get_vocab_size() - 1.

Suggested change
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size() - 1):

@@ -394,7 +394,7 @@ def _validate_model_input(
else:
tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
max_input_id = max(prompt_ids, default=0)
if max_input_id > tokenizer.max_token_id:
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps no need max? model config vocab size is the source of truth.

Suggested change
if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
if max_input_id > self.model_config.get_vocab_size() - 1:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we observe that we have such ids generated in Qwen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have such ID generated by Qwen which is out of the tokenizer's vocabulary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean then? Just curious how the model / engine handle it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explanation here: #13175 (comment)
and here why vocab size is 151936(%128==0): QwenLM/Qwen3#147 (comment)

Summary:

the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is `"vocab_size": 151936`. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''. 

It's probably still valid to send the ids in between, because the model can legitimately produce such token id.

Test Plan:
Send 151860 and it's passing. Send 152860 and it complained about invalid token.

Rollback Plan:

Reviewed By: tensormeta

Differential Revision: D79840114
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D79840114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants