-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Extract args directly from kernel if we can #18387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Changes from all commits
172d504
d046b0a
a92ff30
c2813f0
bebc2cf
3caf6b0
47472d4
015a5a2
a58c502
b07c7d1
54b4621
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2385,7 +2385,10 @@ static ur_result_t SetKernelParamsAndLaunch( | |
const std::function<void *(Requirement *Req)> &getMemAllocationFunc, | ||
bool IsCooperative, bool KernelUsesClusterLaunch, | ||
uint32_t WorkGroupMemorySize, const RTDeviceBinaryImage *BinImage, | ||
KernelNameStrRefT KernelName) { | ||
KernelNameStrRefT KernelName, void *KernelFuncPtr = nullptr, | ||
int KernelNumArgs = 0, | ||
detail::kernel_param_desc_t (*KernelParamDescGetter)(int) = nullptr, | ||
bool KernelHasSpecialCaptures = true) { | ||
assert(Queue && "Kernel submissions should have an associated queue"); | ||
const AdapterPtr &Adapter = Queue->getAdapter(); | ||
|
||
|
@@ -2397,13 +2400,38 @@ static ur_result_t SetKernelParamsAndLaunch( | |
: Empty); | ||
} | ||
|
||
auto setFunc = [&Adapter, Kernel, &DeviceImageImpl, &getMemAllocationFunc, | ||
&Queue](detail::ArgDesc &Arg, size_t NextTrueIndex) { | ||
SetArgBasedOnType(Adapter, Kernel, DeviceImageImpl, getMemAllocationFunc, | ||
Queue->getContextImplPtr(), Arg, NextTrueIndex); | ||
}; | ||
|
||
applyFuncOnFilteredArgs(EliminatedArgMask, Args, setFunc); | ||
if (KernelFuncPtr && !KernelHasSpecialCaptures) { | ||
auto setFunc = [&Adapter, Kernel, | ||
KernelFuncPtr](const detail::kernel_param_desc_t &ParamDesc, | ||
size_t NextTrueIndex) { | ||
const void *ArgPtr = (const char *)KernelFuncPtr + ParamDesc.offset; | ||
switch (ParamDesc.kind) { | ||
case kernel_param_kind_t::kind_std_layout: { | ||
int Size = ParamDesc.info; | ||
Adapter->call<UrApiKind::urKernelSetArgValue>(Kernel, NextTrueIndex, | ||
Size, nullptr, ArgPtr); | ||
break; | ||
} | ||
case kernel_param_kind_t::kind_pointer: { | ||
const void *Ptr = *static_cast<const void *const *>(ArgPtr); | ||
Adapter->call<UrApiKind::urKernelSetArgPointer>(Kernel, NextTrueIndex, | ||
nullptr, Ptr); | ||
break; | ||
} | ||
default: | ||
throw std::runtime_error("Direct kernel argument copy failed."); | ||
} | ||
}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could it make sense to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not with the current design. This is what I was alluding to when I said it would be a good idea to try and unify the The main problem is that we have two different ways to represent what an argument is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see what you mean. Yeah, I am not sure it's worth trying to repack the arguments. |
||
applyFuncOnFilteredArgs(EliminatedArgMask, KernelNumArgs, | ||
KernelParamDescGetter, setFunc); | ||
} else { | ||
auto setFunc = [&Adapter, Kernel, &DeviceImageImpl, &getMemAllocationFunc, | ||
&Queue](detail::ArgDesc &Arg, size_t NextTrueIndex) { | ||
SetArgBasedOnType(Adapter, Kernel, DeviceImageImpl, getMemAllocationFunc, | ||
Queue->getContextImplPtr(), Arg, NextTrueIndex); | ||
}; | ||
applyFuncOnFilteredArgs(EliminatedArgMask, Args, setFunc); | ||
} | ||
|
||
std::optional<int> ImplicitLocalArg = | ||
ProgramManager::getInstance().kernelImplicitLocalArgPos(KernelName); | ||
|
@@ -2655,7 +2683,9 @@ void enqueueImpKernel( | |
const std::function<void *(Requirement *Req)> &getMemAllocationFunc, | ||
ur_kernel_cache_config_t KernelCacheConfig, const bool KernelIsCooperative, | ||
const bool KernelUsesClusterLaunch, const size_t WorkGroupMemorySize, | ||
const RTDeviceBinaryImage *BinImage) { | ||
const RTDeviceBinaryImage *BinImage, void *KernelFuncPtr, int KernelNumArgs, | ||
detail::kernel_param_desc_t (*KernelParamDescGetter)(int), | ||
bool KernelHasSpecialCaptures) { | ||
assert(Queue && "Kernel submissions should have an associated queue"); | ||
// Run OpenCL kernel | ||
auto &ContextImpl = Queue->getContextImplPtr(); | ||
|
@@ -2739,7 +2769,8 @@ void enqueueImpKernel( | |
Queue, Args, DeviceImageImpl, Kernel, NDRDesc, EventsWaitList, | ||
OutEventImpl, EliminatedArgMask, getMemAllocationFunc, | ||
KernelIsCooperative, KernelUsesClusterLaunch, WorkGroupMemorySize, | ||
BinImage, KernelName); | ||
BinImage, KernelName, KernelFuncPtr, KernelNumArgs, | ||
KernelParamDescGetter, KernelHasSpecialCaptures); | ||
|
||
const AdapterPtr &Adapter = Queue->getAdapter(); | ||
if (!SyclKernelImpl && !MSyclKernel) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
SetArgBasedOnType
doesIs there a reason we don't do that here? Is the else-case falling under "special captures"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm honest, I don't really understand how the old
MArgs
code works. You can see that each argument inMArgs
is represented by a pointer to an argument, and this else branch only triggers when that pointer is null.On the fast path, I'm extracting a standard layout argument directly from the function object. Since it's a standard layout object and not a pointer to one, it can never be null, so I removed the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is some special case for some local memory accessors. I wonder if these changes can handle that correctly. Let's hope testing is good enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be some overloading of the meaning of the kernel parameter kind. The vector contains the arguments after decomposition, which means that there may indeed be some special fields that need handling.
The array I'm working with contains a description of the original arguments, and so anything that's captured as a "standard layout" class really needs to be one -- a local accessor is identified in the array as an accessor, so it will be counted as a special capture.