Fix semi-sync training with 1GPU per FT replica #1505

bentherien · 2025-07-31T20:13:01Z

When using semi-sync training with a single GPU per replica, no mesh is created in ParallelDims, causing an error when all-reducing the loss.

The slicing error occurs on this line:

Line 502 in cf30b29

dist_utils.dist_mean(loss, parallel_dims.world_mesh["dp_cp"], ft_pg),

This PR prevents this from happening by creating a default mesh when no parallelism is used.

…ensor

bentherien requested review from tianyu-l, fegin, wwwjn and wconstab as code owners July 31, 2025 20:13

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 31, 2025

create mesh even with no parallelism to avoid indexing into an empy t…

5e370aa

…ensor

bentherien force-pushed the torchft_mesh_fix branch from 5247387 to 5e370aa Compare August 14, 2025 20:56

Provide feedback