vllm fix check on max vocab size #22471

xw285cornell · 2025-08-07T21:35:41Z

Summary:
the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is "vocab_size": 151936. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''.

It's probably still valid to send the ids in between, because the model can legitimately produce such token id.

Test Plan:
Send 151860 and it's passing. Send 152860 and it complained about invalid token.

Rollback Plan:

Differential Revision: D79840114

github-actions · 2025-08-07T21:35:54Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

facebook-github-bot · 2025-08-07T21:36:00Z

This pull request was exported from Phabricator. Differential Revision: D79840114

gemini-code-assist

Code Review

The pull request aims to fix a vocabulary size check to account for differences between the tokenizer's vocabulary and the model's vocabulary. While the intent is correct, the implementation introduces an off-by-one error by comparing a token ID with the vocabulary size directly, instead of the maximum valid token ID (vocab_size - 1). This could lead to out-of-bounds errors. I've provided a suggestion to correct this.

gemini-code-assist · 2025-08-07T21:36:29Z

vllm/v1/engine/processor.py

@@ -394,7 +394,7 @@ def _validate_model_input(
        else:
            tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
            max_input_id = max(prompt_ids, default=0)
-            if max_input_id > tokenizer.max_token_id:
+            if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):


There's an off-by-one error in the vocabulary size check. self.model_config.get_vocab_size() returns the size of the vocabulary (e.g., 151936), so the maximum valid token ID is vocab_size - 1 (e.g., 151935).

The current check max_input_id > self.model_config.get_vocab_size() would incorrectly allow a max_input_id equal to vocab_size, which is out of bounds.

The condition should compare against self.model_config.get_vocab_size() - 1.

Suggested change

if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):

if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size() - 1):

22quinn · 2025-08-07T21:45:23Z

vllm/v1/engine/processor.py

@@ -394,7 +394,7 @@ def _validate_model_input(
        else:
            tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
            max_input_id = max(prompt_ids, default=0)
-            if max_input_id > tokenizer.max_token_id:
+            if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):


perhaps no need max? model config vocab size is the source of truth.

Suggested change

if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):

if max_input_id > self.model_config.get_vocab_size() - 1:

Do we observe that we have such ids generated in Qwen?

Yes, we have such ID generated by Qwen which is out of the tokenizer's vocabulary.

What does that mean then? Just curious how the model / engine handle it?

Some explanation here: #13175 (comment)
and here why vocab size is 151936(%128==0): QwenLM/Qwen3#147 (comment)

Summary: the tokenizer.vocab_size and model.vocab_size can be different. For QWen model, the tokenizer max token id is 151643 and the model config is `"vocab_size": 151936`. If we send an id between 151643 and 151936, it'll fail. Though in reality the tokenizer will just put ''. It's probably still valid to send the ids in between, because the model can legitimately produce such token id. Test Plan: Send 151860 and it's passing. Send 152860 and it complained about invalid token. Rollback Plan: Reviewed By: tensormeta Differential Revision: D79840114

facebook-github-bot · 2025-08-08T23:37:00Z

This pull request was exported from Phabricator. Differential Revision: D79840114

xw285cornell requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners August 7, 2025 21:35

mergify bot added the v1 label Aug 7, 2025

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

22quinn reviewed Aug 7, 2025

View reviewed changes

xw285cornell force-pushed the export-D79840114 branch from 5089ff3 to e713e9b Compare August 8, 2025 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

vllm fix check on max vocab size #22471

vllm fix check on max vocab size #22471

xw285cornell commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

22quinn Aug 7, 2025

Uh oh!

houseroad Aug 7, 2025

Uh oh!

xw285cornell Aug 8, 2025

Uh oh!

houseroad Aug 8, 2025

Uh oh!

22quinn Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

Uh oh!

	if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
	if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size() - 1):

	if max_input_id > max(tokenizer.max_token_id, self.model_config.get_vocab_size()):
	if max_input_id > self.model_config.get_vocab_size() - 1:

Uh oh!

vllm fix check on max vocab size #22471

Are you sure you want to change the base?

vllm fix check on max vocab size #22471

Conversation

xw285cornell commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

22quinn Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

xw285cornell Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

22quinn Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

Uh oh!