Skip to content

Commit e792980

Browse files
fduwjjpytorchmergebot
authored andcommitted
[c10d][ez] Add comments to CudaEventCache class (pytorch#134172)
Pull Request resolved: pytorch#134172 Approved by: https://github.com/d4l3k, https://github.com/kwen2501
1 parent b319fa3 commit e792980

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,10 @@ void ProcessGroupNCCL::WorkNCCL::abort() {
763763

764764
ProcessGroupNCCL::CUDAEventCache::CUDAEventCache() {}
765765

766+
// CUDA event is used to record the start/end of one Work.
767+
// Instead of let the CUDA event gets destroyed, we now reuse it after the Work
768+
// has been erased from workMetaList_.
769+
// This is to avoid the potential deadlock caused by CudaEventDestroy.
766770
std::shared_ptr<at::cuda::CUDAEvent> ProcessGroupNCCL::CUDAEventCache::create(
767771
bool timing) {
768772
auto deleter = [this, timing](at::cuda::CUDAEvent* event) {

0 commit comments

Comments
 (0)