Replies: 1 comment
-
It may also be worth looking at DeepseekVL2 models which share the same vocabulary as DeepseekV3. This one, maybe then it could be offloaded to the gpu? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
"Based on our evaluation, the acceptance rate of the second token prediction ranges between 85% and 90% across various generation topics, demonstrating consistent reliability. This high acceptance rate enables DeepSeek-V3 to achieve a significantly improved decoding speed, delivering 1.8 times TPS (Tokens Per Second)."
(The DeepSeek v3 report)
Beta Was this translation helpful? Give feedback.
All reactions