-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepseek r1 support? #2287
Comments
Hey @johnnynunez , we already support the qwen/llama models, which I believe are the base models for R1. I haven't tried it, but I believe that all you have to do is update the download command to use the R1 version. In other words, in this config: https://github.com/pytorch/torchtune/blob/main/recipes/configs/qwen2_5/32B_lora.yaml
If you give it a try, please let us know if it did/didn't work. Note: please notice that these are the instructions to use the R1 model, but not to finetune using the RL method shared in the paper. |
Thanks mate! |
+1 on support for these- particularly on https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/tree/main model. I confirmed that this does NOT work with the existing Llama 3 implementation torchtune. The tokenizer has substantial differences that I wasn't able to get working. |
@thomascleberg thanks for flagging! That sucks :/ We have been having discussions about updating our tokenizer functions and taking HF tokenizers as a dependency, so that its faster to onboard a model, since we can just pass the config to it. Some discussion here: #2212 Currently we don't have bandwidth to onboard DeepSeek, but if you would like to submit a PR, we would be glad to review it. Again, sorry about that. |
This would be very useful! Fiddling around with the difference between TikToken and Tokenizer formats has been a whole lot of the pain of working in Torchtune.
If I get it working on my side, I will do so. Thanks! |
@thomascleberg curious what issues did you run into? The Llama 70b distill worked out of the box for me fwiw |
^ Is the DeepSeek-R1-Distill-Llama tokenizer different than the usual Llama 3 tokenizer? |
Sorry, I take this back: I incorrectly assumed that the distilled llama variants used the llama tokenizers but looks like R1 introduces its own special tokens. @thomascleberg is indeed correct that the tokenization you get from using the Llama3 tokenizers is off. I hacked torchtune to use the HF AutoTokenizer and it seems to be working now. Would indeed be great to have native support for this in tt! |
@felipemello1 any plans to support the actual DeepSeek V3 / R1 model (that's the 600B MoE, not the llama/qwen distillations) |
We have been working to support MoE, but we dont have anything concrete for DeepSeek V3/R1 specifically. @Palmik |
https://github.com/deepseek-ai/DeepSeek-R1
The text was updated successfully, but these errors were encountered: