-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Optimize handler::StoreLambda implementation #17669
[SYCL] Optimize handler::StoreLambda implementation #17669
Conversation
Current implemenation creates descriptions of all kernal params, then process them in turn. It's prossible to process each param right away.
@@ -375,16 +375,6 @@ template <int Dims> bool range_size_fits_in_size_t(const range<Dims> &r) { | |||
return true; | |||
} | |||
|
|||
template <typename KernelNameType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure can we drop it unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can, but not sure either. IMO, let's drop and fix if something breaks somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can find, you removed the only use of this function, so I agree with @aelovikov-intel - drop it! ⭐
sycl/include/sycl/handler.hpp
Outdated
reserveArgs(NumParams); | ||
|
||
for (size_t I = 0, IndexShift = 0; I < NumParams; ++I) { | ||
extractArgsAndReqsFromLambda( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractArgs*
for a single argument doesn't look like a right name for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the advice, I moved the loop over args inside of extractArgsAndReqsFromLambda()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd want @steffenlarsen to approve the usage of function pointer too, just in case.
char *LambdaPtr, detail::kernel_param_desc_t (*ParamDescGetter)(int), | ||
size_t NumKernelParams, bool IsESIMD) { | ||
size_t IndexShift = 0; | ||
impl->MArgs.reserve(MaxNumAdditionalArgs * NumKernelParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from this line, why do we have to move this to the sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, there is no strict necessity, just desire to move functionality out of public headers and out of templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the templates I don't mind, but the more symbols we have in the library, the less freedom we have due to the ABI it imposes. I could maybe also see some benefit in having this in the header w.r.t. inlining.
That is not to say I am against moving more stuff to the library, but when we do I prefer we move as much as we can to minimize the ABI surface.
Note that this is not a strong objection to moving this, but mainly for us to consider the pros/cons of moving small segments from the header to the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch isn't based on the latest origin/sycl
. Top of trunk has this (note constexpr
):
llvm/sycl/include/sycl/handler.hpp
Lines 769 to 774 in 0a406c9
if constexpr (KernelHasName) { | |
// TODO support ESIMD in no-integration-header case too. | |
clearArgs(); | |
extractArgsAndReqsFromLambda(MHostKernel->getPtr(), | |
detail::getKernelParamDescs<KernelName>(), | |
detail::isKernelESIMD<KernelName>()); |
the only other thing we can fold into this library api is the clearArgs
call (which might be a good thing to do, but doesn't really affect our freedom regarding ABI changes).
@steffenlarsen , I agree with you in general, but don't see how that is applicable in this particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here is, it seems like all that is actually forcing us to move this to the library is an optimization for pre-emptive reservation. It means the function we now have in the ABI does a handful of different things. Keeping it in the header and adding another new function in the library that does as little as possible still means we have another symbol, but I would argue that the chance of its signature changing or part of the ABI causing problems decreases with that design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that current plan is to have extractArgsAndReqsFromLambda()
logic internal and to prevent export of handler::processArg()
at ABI breaking moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, yes. @steffenlarsen , can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that plan sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steffenlarsen , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I like that solution, then we can move it when we promote preview during the next ABI break. ⭐
@aelovikov-intel is this PR ready to be merged? |
I think the plan was to split it in two pieces in case bisect/revert would be necessary. However, I'm fine merging as-is. |
@aelovikov-intel , it's probably some misunderstanding. I think SYCL Do not lock unconditionally while access queue_iml::MInOrderExternalEvent #17575 should be split, and I definitely put the independent part to [SYCL] Do not lock unconditionally while access queue_iml::MMissedCleanupRequests #17883, |
My brain was melting from working on conflicts resolution... You're definitely correct :) I don't mind the other PR going as one piece though ;) |
Current implementation creates descriptions of all kernel params, then process them one by one. It's possible to process each param right away.