Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepseek r1 support? #2287

Open
johnnynunez opened this issue Jan 21, 2025 · 10 comments
Open

deepseek r1 support? #2287

johnnynunez opened this issue Jan 21, 2025 · 10 comments
Labels
enhancement New feature or request triage review This issue should be discussed in weekly review

Comments

@johnnynunez
Copy link

https://github.com/deepseek-ai/DeepSeek-R1

@felipemello1
Copy link
Contributor

felipemello1 commented Jan 22, 2025

Hey @johnnynunez , we already support the qwen/llama models, which I believe are the base models for R1. I haven't tried it, but I believe that all you have to do is update the download command to use the R1 version.

In other words, in this config: https://github.com/pytorch/torchtune/blob/main/recipes/configs/qwen2_5/32B_lora.yaml

  1. Change:
tune download <HF_model_deepseek_R1> --output-dir /tmp/Qwen2_5-32B-Instruct
  1. For completion, you can change the name of your directories, so it doesnt confuse you. E.g. replace Qwen2_5 with deepseek-r1

If you give it a try, please let us know if it did/didn't work.

Note: please notice that these are the instructions to use the R1 model, but not to finetune using the RL method shared in the paper.

@johnnynunez
Copy link
Author

Thanks mate!

@thomascleberg
Copy link

+1 on support for these- particularly on https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/tree/main model.

I confirmed that this does NOT work with the existing Llama 3 implementation torchtune. The tokenizer has substantial differences that I wasn't able to get working.

@felipemello1
Copy link
Contributor

@thomascleberg thanks for flagging! That sucks :/

We have been having discussions about updating our tokenizer functions and taking HF tokenizers as a dependency, so that its faster to onboard a model, since we can just pass the config to it. Some discussion here: #2212

Currently we don't have bandwidth to onboard DeepSeek, but if you would like to submit a PR, we would be glad to review it. Again, sorry about that.

@felipemello1 felipemello1 added enhancement New feature or request triaged This issue has been assigned an owner and appropriate label labels Jan 22, 2025
@thomascleberg
Copy link

HF tokenizers as a dependency, so that its faster to onboard a model, since we can just pass the config to it.

This would be very useful! Fiddling around with the difference between TikToken and Tokenizer formats has been a whole lot of the pain of working in Torchtune.

Currently we don't have bandwidth to onboard DeepSeek, but if you would like to submit a PR, we would be glad to review it. Again, sorry about that.

If I get it working on my side, I will do so. Thanks!

@EugenHotaj
Copy link
Contributor

@thomascleberg curious what issues did you run into? The Llama 70b distill worked out of the box for me fwiw

@tginart
Copy link

tginart commented Jan 23, 2025

^ Is the DeepSeek-R1-Distill-Llama tokenizer different than the usual Llama 3 tokenizer?
@EugenHotaj @thomascleberg

@EugenHotaj
Copy link
Contributor

EugenHotaj commented Jan 23, 2025

Sorry, I take this back: I incorrectly assumed that the distilled llama variants used the llama tokenizers but looks like R1 introduces its own special tokens.

@thomascleberg is indeed correct that the tokenization you get from using the Llama3 tokenizers is off. I hacked torchtune to use the HF AutoTokenizer and it seems to be working now. Would indeed be great to have native support for this in tt!

@felipemello1 felipemello1 added triage review This issue should be discussed in weekly review and removed triaged This issue has been assigned an owner and appropriate label labels Jan 23, 2025
@Palmik
Copy link

Palmik commented Jan 30, 2025

@felipemello1 any plans to support the actual DeepSeek V3 / R1 model (that's the 600B MoE, not the llama/qwen distillations)

@felipemello1
Copy link
Contributor

We have been working to support MoE, but we dont have anything concrete for DeepSeek V3/R1 specifically. @Palmik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage review This issue should be discussed in weekly review
Projects
None yet
Development

No branches or pull requests

6 participants