[a2av] Add autograd support for token combine op #1511

kwen2501 · 2025-08-01T00:01:27Z

Added class TokenCombiner which combines tokens from different experts, with backward support.

Usage:

combiner = TokenCombiner(group_name, align, max_inp_len, max_out_len, inp.shape[1:], world_size, ne, dtype)
# inp, out, in_splits_offsets, out_splits_offsets must be symmetric tensors
output = combiner(inp, out, in_splits_offsets, out_splits_offsets)

Supports:

torch.compile(combiner)

tianyu-l · 2025-08-11T22:59:31Z

torchtitan/experiments/kernels/moe/combine.py

+        self.grad_out_buf = symm_mem.empty(out_len, *token_shape, dtype=dtype)
+        self.grad_in_buf = symm_mem.empty(in_len, *token_shape, dtype=dtype)


similar request: please allow device as an input to constructor

vwxyzjn · 2025-08-12T21:17:14Z

Hi @kwen2501 is this kernels for all to all communications? Any ways to test it out with deepseek v3?

kwen2501 · 2025-08-12T22:15:57Z

@vwxyzjn Yes, the inner part of the PR is an op named torch.ops.symm_mem.all_to_all_vdev_2d_offset, which accepts token splits + offsets on GPU. The PR just wraps that op with Autograd support.

Similarly, for token dispatch, we have an op named all_to_all_vdev_2d. There is more documentation about these two ops here:
https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/symm_mem/nvshmem_extension.cu#L557
https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/symm_mem/nvshmem_extension.cu#L704

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 1, 2025

[a2av] Add autograd support for token combine op

bc3833f

kwen2501 force-pushed the combine_autograd branch from 9854452 to bc3833f Compare August 1, 2025 00:07

kwen2501 requested a review from tianyu-l August 1, 2025 00:08

tianyu-l approved these changes Aug 11, 2025

View reviewed changes

Add device to init args

b9a1040

kwen2501 merged commit cf4de26 into main Aug 12, 2025
4 checks passed

tianyu-l deleted the combine_autograd branch August 12, 2025 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[a2av] Add autograd support for token combine op #1511

[a2av] Add autograd support for token combine op #1511

Uh oh!

kwen2501 commented Aug 1, 2025

Uh oh!

tianyu-l Aug 11, 2025

Uh oh!

Uh oh!

vwxyzjn commented Aug 12, 2025

Uh oh!

kwen2501 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

		self.grad_out_buf = symm_mem.empty(out_len, *token_shape, dtype=dtype)
		self.grad_in_buf = symm_mem.empty(in_len, *token_shape, dtype=dtype)

[a2av] Add autograd support for token combine op #1511

[a2av] Add autograd support for token combine op #1511

Uh oh!

Conversation

kwen2501 commented Aug 1, 2025

Uh oh!

tianyu-l Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vwxyzjn commented Aug 12, 2025

Uh oh!

kwen2501 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kwen2501 commented Aug 12, 2025 •

edited

Loading