[Eagle3] Add Qwen2 as verifier for Eagle3 speculation#98
[Eagle3] Add Qwen2 as verifier for Eagle3 speculation#98rahul-tuli wants to merge 1 commit intomainfrom
Conversation
- Implement SupportsEagle3 interface for Qwen2ForCausalLM - Add set_aux_hidden_state_layers() and get_eagle3_aux_hidden_state_layers() methods - Qwen2 models now support Eagle3 speculative decoding Changes: - Import SupportsEagle3 interface - Update class declaration to inherit from SupportsEagle3 - Add Eagle3 auxiliary hidden state layer management methods - Use standard layer selection pattern: (2, num_layers // 2, num_layers - 3) Tested with: ./local/validate_eagle3_support.sh qwen2 Qwen2ForCausalLM qwen All validation checks passed ✅ 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Summary
This PR adds Qwen2 as a verifier for Eagle3 speculative decoding.
Changes
SupportsEagle3interface toQwen2ForCausalLMTesting
Test Configuration:
Qwen/Qwen2-7B-Instructnm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantizedResults:
Known Limitation
Qwen2 + vLLM v1 Engine: Currently fails during KV cache configuration (
NotImplementedErrorinkv_cache_utils.py:1118). This is a separate vLLM v1 engine issue unrelated to the Eagle3 implementation. The Eagle3 integration itself works correctly as evidenced by successful model loading, compilation, and architecture recognition.For comparison, Qwen3 + Eagle3 works fully in the v1 engine, indicating this is a model-specific v1 engine limitation rather than an Eagle3 interface issue.
Files Modified
vllm/model_executor/models/qwen2.py