[DeepSeek] remove numpy, avoid tolist in gatherd_idxs #1019

garrett361 · 2025-03-26T18:01:02Z

Removes the numpy usage and tolist CUDA sync when computing gatherd_idxs.

kwen2501 · 2025-03-26T19:21:19Z

Thanks much for the PR! Indeed would look much nicer!

Do you mind testing your PR with
torchrun --standalone --nproc-per-node 4 generate.py
?

I am seeing this result:

Output: You are a helpful assistant.
User: What is 2+2?

Assistant: ' ' ' ' ' ' ' ' ' ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ': ':

garrett361 · 2025-03-26T19:36:24Z

Hmm, weird, yes I'm seeing the same. I hadn't tested generate.py, but I made some small unit tests and also checked the losses and grad norm in run.py for an randomly initialized model.

Let me look into it and I'll ping you when I figure it out.

garrett361 · 2025-03-26T19:58:09Z

Ah, it was just the self.ep_rank * self.experts_per_rank offset I added to the gatherd_idx. The conventions here are slightly different than what I'm using elsewhere and that misunderstanding affected my unit tests, too. Removed the offset and it's fixed now:

⬢ [podman] ❯ torchrun --standalone --nproc-per-node 4 torchtitan/experiments/deepseek_v3/generate.py
W0326 19:53:05.600000 1950302 torch/distributed/run.py:792]
W0326 19:53:05.600000 1950302 torch/distributed/run.py:792] *****************************************
W0326 19:53:05.600000 1950302 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0326 19:53:05.600000 1950302 torch/distributed/run.py:792] *****************************************
Creating model stage 0 of 2
Creating model stage 0 of 2
Creating model stage 1 of 2
Creating model stage 1 of 2
EP rank [0]: Created Symmetric Memory for MoE
Running inference with deepseek-ai/DeepSeek-V2-Lite-Chat on (2, 2) mesh
EP rank [1]: Created Symmetric Memory for MoE
EP rank [1]: Created Symmetric Memory for MoE
EP rank [0]: Created Symmetric Memory for MoE

Output: You are a helpful assistant.
User: What is 2+2?

Assistant: 2 + 2 equals 4.

Closing inference mesh...

Apologies for that; should have tested more carefully.

kwen2501

Looks great! Thank you for the nice code!

torchtitan/experiments/deepseek_v3/model.py

EugenHotaj · 2025-03-26T21:18:59Z

@garrett361 Nice add! Do you notice any speed boost from removing cpu sync?

garrett361 · 2025-03-27T19:06:25Z

Thanks @EugenHotaj ! Good question, but I didn't time it and the {run,generate}.py scripts don't have any timing metrics reported, currently, so can't say.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 26, 2025

remove numpy, avoid tolist in gatherd_idxs

1f1b16d

garrett361 force-pushed the deepseek-indexing branch 2 times, most recently from a453229 to 1f1b16d Compare March 26, 2025 19:17

kwen2501 self-requested a review March 26, 2025 19:21

fix: rm offset

1e247a1

kwen2501 approved these changes Mar 26, 2025

View reviewed changes

torchtitan/experiments/deepseek_v3/model.py Show resolved Hide resolved

move mod

ba0a279

kwen2501 merged commit bab83a4 into pytorch:main Mar 26, 2025
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DeepSeek] remove numpy, avoid tolist in gatherd_idxs #1019

[DeepSeek] remove numpy, avoid tolist in gatherd_idxs #1019

Uh oh!

garrett361 commented Mar 26, 2025

Uh oh!

kwen2501 commented Mar 26, 2025

Uh oh!

garrett361 commented Mar 26, 2025

Uh oh!

garrett361 commented Mar 26, 2025

Uh oh!

kwen2501 left a comment

Uh oh!

Uh oh!

EugenHotaj commented Mar 26, 2025

Uh oh!

Uh oh!

garrett361 commented Mar 27, 2025

Uh oh!

Uh oh!

[DeepSeek] remove numpy, avoid tolist in gatherd_idxs #1019

[DeepSeek] remove numpy, avoid tolist in gatherd_idxs #1019

Uh oh!

Conversation

garrett361 commented Mar 26, 2025

Uh oh!

kwen2501 commented Mar 26, 2025

Uh oh!

garrett361 commented Mar 26, 2025

Uh oh!

garrett361 commented Mar 26, 2025

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EugenHotaj commented Mar 26, 2025

Uh oh!

Uh oh!

garrett361 commented Mar 27, 2025

Uh oh!

Uh oh!