Skip to content

Commit 24078c2

Browse files
authored
Merge pull request #1299 from rafbiels/rafbiels/fix-cuda-maxreg-check
[CUDA] Fix MaxRegsPerBlock check in setKernelParams
2 parents 227a5ed + 89a66af commit 24078c2

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

source/adapters/cuda/enqueue.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -245,13 +245,14 @@ setKernelParams(const ur_context_handle_t Context,
245245
return UR_RESULT_SUCCESS;
246246
};
247247

248-
size_t KernelLocalWorkGroupSize = 0;
248+
size_t KernelLocalWorkGroupSize = 1;
249249
for (size_t Dim = 0; Dim < WorkDim; Dim++) {
250250
auto Err = IsValid(Dim);
251251
if (Err != UR_RESULT_SUCCESS)
252252
return Err;
253-
// If no error then sum the total local work size per dim.
254-
KernelLocalWorkGroupSize += LocalWorkSize[Dim];
253+
// If no error then compute the total local work size as a product of
254+
// all dims.
255+
KernelLocalWorkGroupSize *= LocalWorkSize[Dim];
255256
}
256257

257258
if (hasExceededMaxRegistersPerBlock(Device, Kernel,

0 commit comments

Comments
 (0)