Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Optimize handler::StoreLambda implementation #17669

Conversation

Alexandr-Konovalov
Copy link
Contributor

@Alexandr-Konovalov Alexandr-Konovalov commented Mar 26, 2025

Current implementation creates descriptions of all kernel params, then process them one by one. It's possible to process each param right away.

Current implemenation creates descriptions of all kernal params, then
process them in turn. It's prossible to process each param right away.
@@ -375,16 +375,6 @@ template <int Dims> bool range_size_fits_in_size_t(const range<Dims> &r) {
return true;
}

template <typename KernelNameType>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure can we drop it unconditionally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can, but not sure either. IMO, let's drop and fix if something breaks somewhere.

+ @steffenlarsen , @AlexeySachkov , @sergey-semenov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can find, you removed the only use of this function, so I agree with @aelovikov-intel - drop it! ⭐

reserveArgs(NumParams);

for (size_t I = 0, IndexShift = 0; I < NumParams; ++I) {
extractArgsAndReqsFromLambda(
Copy link
Contributor

@aelovikov-intel aelovikov-intel Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractArgs* for a single argument doesn't look like a right name for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice, I moved the loop over args inside of extractArgsAndReqsFromLambda().

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd want @steffenlarsen to approve the usage of function pointer too, just in case.

char *LambdaPtr, detail::kernel_param_desc_t (*ParamDescGetter)(int),
size_t NumKernelParams, bool IsESIMD) {
size_t IndexShift = 0;
impl->MArgs.reserve(MaxNumAdditionalArgs * NumKernelParams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this line, why do we have to move this to the sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, there is no strict necessity, just desire to move functionality out of public headers and out of templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the templates I don't mind, but the more symbols we have in the library, the less freedom we have due to the ABI it imposes. I could maybe also see some benefit in having this in the header w.r.t. inlining.
That is not to say I am against moving more stuff to the library, but when we do I prefer we move as much as we can to minimize the ABI surface.
Note that this is not a strong objection to moving this, but mainly for us to consider the pros/cons of moving small segments from the header to the library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch isn't based on the latest origin/sycl. Top of trunk has this (note constexpr):

if constexpr (KernelHasName) {
// TODO support ESIMD in no-integration-header case too.
clearArgs();
extractArgsAndReqsFromLambda(MHostKernel->getPtr(),
detail::getKernelParamDescs<KernelName>(),
detail::isKernelESIMD<KernelName>());

the only other thing we can fold into this library api is the clearArgs call (which might be a good thing to do, but doesn't really affect our freedom regarding ABI changes).

@steffenlarsen , I agree with you in general, but don't see how that is applicable in this particular case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point here is, it seems like all that is actually forcing us to move this to the library is an optimization for pre-emptive reservation. It means the function we now have in the ABI does a handful of different things. Keeping it in the header and adding another new function in the library that does as little as possible still means we have another symbol, but I would argue that the chance of its signature changing or part of the ABI causing problems decreases with that design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that current plan is to have extractArgsAndReqsFromLambda() logic internal and to prevent export of handler::processArg() at ABI breaking moment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, yes. @steffenlarsen , can you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that plan sounds good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffenlarsen , what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I like that solution, then we can move it when we promote preview during the next ABI break. ⭐

@vinser52
Copy link
Contributor

vinser52 commented Apr 7, 2025

@aelovikov-intel is this PR ready to be merged?

@aelovikov-intel
Copy link
Contributor

I think the plan was to split it in two pieces in case bisect/revert would be necessary. However, I'm fine merging as-is.

@aelovikov-intel aelovikov-intel merged commit 313c4f0 into intel:sycl Apr 7, 2025
24 checks passed
@Alexandr-Konovalov
Copy link
Contributor Author

I think the plan was to split it in two pieces in case bisect/revert would be necessary. However, I'm fine merging as-is.

@aelovikov-intel , it's probably some misunderstanding. I think SYCL Do not lock unconditionally while access queue_iml::MInOrderExternalEvent #17575 should be split, and I definitely put the independent part to [SYCL] Do not lock unconditionally while access queue_iml::MMissedCleanupRequests #17883,

@aelovikov-intel
Copy link
Contributor

I think the plan was to split it in two pieces in case bisect/revert would be necessary. However, I'm fine merging as-is.

@aelovikov-intel , it's probably some misunderstanding. I think SYCL Do not lock unconditionally while access queue_iml::MInOrderExternalEvent #17575 should be split, and I definitely put the independent part to [SYCL] Do not lock unconditionally while access queue_iml::MMissedCleanupRequests #17883,

My brain was melting from working on conflicts resolution... You're definitely correct :)

I don't mind the other PR going as one piece though ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants