Skip to content

Commit 0cb1a33

Browse files
fix Muti node CUDA error: invalid device ordinal #3775 (#3779)
1 parent dfdc219 commit 0cb1a33

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/accelerate/state.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -400,7 +400,7 @@ def wait_for_everyone(self):
400400
DistributedType.DEEPSPEED,
401401
DistributedType.FSDP,
402402
):
403-
torch.distributed.barrier(device_ids=[self.process_index])
403+
torch.distributed.barrier(device_ids=[self.local_process_index])
404404
elif self.distributed_type == DistributedType.XLA:
405405
xm.rendezvous("accelerate.utils.wait_for_everyone")
406406

0 commit comments

Comments
 (0)