deepseek r1 support? #2287

johnnynunez · 2025-01-21T18:01:29Z

https://github.com/deepseek-ai/DeepSeek-R1

felipemello1 · 2025-01-22T15:22:58Z

Hey @johnnynunez , we already support the qwen/llama models, which I believe are the base models for R1. I haven't tried it, but I believe that all you have to do is update the download command to use the R1 version.

In other words, in this config: https://github.com/pytorch/torchtune/blob/main/recipes/configs/qwen2_5/32B_lora.yaml

Change:

tune download <HF_model_deepseek_R1> --output-dir /tmp/Qwen2_5-32B-Instruct

For completion, you can change the name of your directories, so it doesnt confuse you. E.g. replace Qwen2_5 with deepseek-r1

If you give it a try, please let us know if it did/didn't work.

Note: please notice that these are the instructions to use the R1 model, but not to finetune using the RL method shared in the paper.

johnnynunez · 2025-01-22T15:53:40Z

Thanks mate!

thomascleberg · 2025-01-22T17:10:57Z

+1 on support for these- particularly on https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/tree/main model.

I confirmed that this does NOT work with the existing Llama 3 implementation torchtune. The tokenizer has substantial differences that I wasn't able to get working.

felipemello1 · 2025-01-22T19:10:22Z

@thomascleberg thanks for flagging! That sucks :/

We have been having discussions about updating our tokenizer functions and taking HF tokenizers as a dependency, so that its faster to onboard a model, since we can just pass the config to it. Some discussion here: #2212

Currently we don't have bandwidth to onboard DeepSeek, but if you would like to submit a PR, we would be glad to review it. Again, sorry about that.

thomascleberg · 2025-01-22T19:39:20Z

HF tokenizers as a dependency, so that its faster to onboard a model, since we can just pass the config to it.

This would be very useful! Fiddling around with the difference between TikToken and Tokenizer formats has been a whole lot of the pain of working in Torchtune.

Currently we don't have bandwidth to onboard DeepSeek, but if you would like to submit a PR, we would be glad to review it. Again, sorry about that.

If I get it working on my side, I will do so. Thanks!

EugenHotaj · 2025-01-22T22:44:22Z

@thomascleberg curious what issues did you run into? The Llama 70b distill worked out of the box for me fwiw

tginart · 2025-01-23T00:13:58Z

^ Is the DeepSeek-R1-Distill-Llama tokenizer different than the usual Llama 3 tokenizer?
@EugenHotaj @thomascleberg

EugenHotaj · 2025-01-23T05:44:34Z

Sorry, I take this back: I incorrectly assumed that the distilled llama variants used the llama tokenizers but looks like R1 introduces its own special tokens.

@thomascleberg is indeed correct that the tokenization you get from using the Llama3 tokenizers is off. I hacked torchtune to use the HF AutoTokenizer and it seems to be working now. Would indeed be great to have native support for this in tt!

Palmik · 2025-01-30T17:27:16Z

@felipemello1 any plans to support the actual DeepSeek V3 / R1 model (that's the 600B MoE, not the llama/qwen distillations)

felipemello1 · 2025-01-30T17:28:28Z

We have been working to support MoE, but we dont have anything concrete for DeepSeek V3/R1 specifically. @Palmik

felipemello1 added enhancement New feature or request triaged This issue has been assigned an owner and appropriate label labels Jan 22, 2025

felipemello1 added triage review This issue should be discussed in weekly review and removed triaged This issue has been assigned an owner and appropriate label labels Jan 23, 2025

mergennachin mentioned this issue Jan 30, 2025

Reproduce/enable DeepSeek R1 Distill Llama 8B pytorch/executorch#7981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek r1 support? #2287

deepseek r1 support? #2287

johnnynunez commented Jan 21, 2025

felipemello1 commented Jan 22, 2025 •

edited

Loading

johnnynunez commented Jan 22, 2025

thomascleberg commented Jan 22, 2025

felipemello1 commented Jan 22, 2025

thomascleberg commented Jan 22, 2025

EugenHotaj commented Jan 22, 2025

tginart commented Jan 23, 2025

EugenHotaj commented Jan 23, 2025 •

edited

Loading

Palmik commented Jan 30, 2025 •

edited

Loading

felipemello1 commented Jan 30, 2025

deepseek r1 support? #2287

deepseek r1 support? #2287

Comments

johnnynunez commented Jan 21, 2025

felipemello1 commented Jan 22, 2025 • edited Loading

johnnynunez commented Jan 22, 2025

thomascleberg commented Jan 22, 2025

felipemello1 commented Jan 22, 2025

thomascleberg commented Jan 22, 2025

EugenHotaj commented Jan 22, 2025

tginart commented Jan 23, 2025

EugenHotaj commented Jan 23, 2025 • edited Loading

Palmik commented Jan 30, 2025 • edited Loading

felipemello1 commented Jan 30, 2025

felipemello1 commented Jan 22, 2025 •

edited

Loading

EugenHotaj commented Jan 23, 2025 •

edited

Loading

Palmik commented Jan 30, 2025 •

edited

Loading