Feature Request: Universal Assisted Generation (Speculative Decoding with any smaller model) #11771

Dango233 · 2025-02-09T14:06:49Z

Dango233
Feb 9, 2025

Running ultra large model like deepseek r1 is a pain - and given there is no good smaller model that comes with same tokenizer, the traditional way of speculative decoding won't work.

In this blog, Universal Assisted Generation (UAG) was introduced, that allows speculative decoding across models with different tokenizers. The implementation was very intuitive - a two-way translation of token to make sure models see what they need to see, and work with each other.

This has been merged in transformers - PR

It would be very valueable if we can use the 1.5B distilled R1 to do speculative decoding on the 671B full model - that should greatly improve the performance.

I did a serach of this repo and seem this has not been discussed yet - hence this post.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Universal Assisted Generation (Speculative Decoding with any smaller model) #11771

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Feature Request: Universal Assisted Generation (Speculative Decoding with any smaller model) #11771

Dango233 Feb 9, 2025

Replies: 0 comments

Dango233
Feb 9, 2025