Skip to content

[SYCL] Optimize handler::StoreLambda implementation #17669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30 changes: 16 additions & 14 deletions sycl/include/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -384,16 +384,6 @@ template <int Dims> bool range_size_fits_in_size_t(const range<Dims> &r) {
return true;
}

template <typename KernelNameType>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure can we drop it unconditionally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can, but not sure either. IMO, let's drop and fix if something breaks somewhere.

+ @steffenlarsen , @AlexeySachkov , @sergey-semenov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can find, you removed the only use of this function, so I agree with @aelovikov-intel - drop it! ⭐

std::vector<kernel_param_desc_t> getKernelParamDescs() {
std::vector<kernel_param_desc_t> Result;
int NumParams = getKernelNumParams<KernelNameType>();
Result.reserve(NumParams);
for (int I = 0; I < NumParams; ++I) {
Result.push_back(getKernelParamDesc<KernelNameType>(I));
}
return Result;
}
} // namespace detail

/// Command group handler class.
Expand Down Expand Up @@ -485,20 +475,30 @@ class __SYCL_EXPORT handler {
"a single kernel or explicit memory operation.");
}

/// Extracts and prepares kernel arguments from the lambda using information
/// from the built-ins or integration header.
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
// TODO: Those functions are not used anymore, remove it in the next
// ABI-breaking window.
void extractArgsAndReqsFromLambda(
char *LambdaPtr,
const std::vector<detail::kernel_param_desc_t> &ParamDescs, bool IsESIMD);
// TODO Unused, remove during ABI breaking window
void
extractArgsAndReqsFromLambda(char *LambdaPtr, size_t KernelArgsNum,
const detail::kernel_param_desc_t *KernelArgs,
bool IsESIMD);
#endif
/// Extracts and prepares kernel arguments from the lambda using information
/// from the built-ins or integration header.
void extractArgsAndReqsFromLambda(
char *LambdaPtr, detail::kernel_param_desc_t (*ParamDescGetter)(int),
size_t NumKernelParams, bool IsESIMD);

/// Extracts and prepares kernel arguments set via set_arg(s).
void extractArgsAndReqs();

#if defined(__INTEL_PREVIEW_BREAKING_CHANGES)
// TODO: processArg need not to be public
__SYCL_DLL_LOCAL
#endif
void processArg(void *Ptr, const detail::kernel_param_kind_t &Kind,
const int Size, const size_t Index, size_t &IndexShift,
bool IsKernelCreatedFromSource, bool IsESIMD);
Expand Down Expand Up @@ -770,9 +770,11 @@ class __SYCL_EXPORT handler {
// header, so don't perform things that require it.
if constexpr (KernelHasName) {
// TODO support ESIMD in no-integration-header case too.

clearArgs();
extractArgsAndReqsFromLambda(MHostKernel->getPtr(),
detail::getKernelParamDescs<KernelName>(),
&(detail::getKernelParamDesc<KernelName>),
detail::getKernelNumParams<KernelName>(),
detail::isKernelESIMD<KernelName>());
MKernelName = detail::getKernelName<KernelName>();
} else {
Expand Down
39 changes: 38 additions & 1 deletion sycl/source/handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1112,6 +1112,43 @@ void handler::extractArgsAndReqs() {
}
}

void handler::extractArgsAndReqsFromLambda(
char *LambdaPtr, detail::kernel_param_desc_t (*ParamDescGetter)(int),
size_t NumKernelParams, bool IsESIMD) {
size_t IndexShift = 0;
impl->MArgs.reserve(MaxNumAdditionalArgs * NumKernelParams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this line, why do we have to move this to the sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, there is no strict necessity, just desire to move functionality out of public headers and out of templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the templates I don't mind, but the more symbols we have in the library, the less freedom we have due to the ABI it imposes. I could maybe also see some benefit in having this in the header w.r.t. inlining.
That is not to say I am against moving more stuff to the library, but when we do I prefer we move as much as we can to minimize the ABI surface.
Note that this is not a strong objection to moving this, but mainly for us to consider the pros/cons of moving small segments from the header to the library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch isn't based on the latest origin/sycl. Top of trunk has this (note constexpr):

if constexpr (KernelHasName) {
// TODO support ESIMD in no-integration-header case too.
clearArgs();
extractArgsAndReqsFromLambda(MHostKernel->getPtr(),
detail::getKernelParamDescs<KernelName>(),
detail::isKernelESIMD<KernelName>());

the only other thing we can fold into this library api is the clearArgs call (which might be a good thing to do, but doesn't really affect our freedom regarding ABI changes).

@steffenlarsen , I agree with you in general, but don't see how that is applicable in this particular case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point here is, it seems like all that is actually forcing us to move this to the library is an optimization for pre-emptive reservation. It means the function we now have in the ABI does a handful of different things. Keeping it in the header and adding another new function in the library that does as little as possible still means we have another symbol, but I would argue that the chance of its signature changing or part of the ABI causing problems decreases with that design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that current plan is to have extractArgsAndReqsFromLambda() logic internal and to prevent export of handler::processArg() at ABI breaking moment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, yes. @steffenlarsen , can you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that plan sounds good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffenlarsen , what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I like that solution, then we can move it when we promote preview during the next ABI break. ⭐


for (size_t I = 0; I < NumKernelParams; ++I) {
detail::kernel_param_desc_t ParamDesc = ParamDescGetter(I);
void *Ptr = LambdaPtr + ParamDesc.offset;
const detail::kernel_param_kind_t &Kind = ParamDesc.kind;
const int &Size = ParamDesc.info;
if (Kind == detail::kernel_param_kind_t::kind_accessor) {
// For args kind of accessor Size is information about accessor.
// The first 11 bits of Size encodes the accessor target.
const access::target AccTarget =
static_cast<access::target>(Size & AccessTargetMask);
if ((AccTarget == access::target::device ||
AccTarget == access::target::constant_buffer) ||
(AccTarget == access::target::image ||
AccTarget == access::target::image_array)) {
detail::AccessorBaseHost *AccBase =
static_cast<detail::AccessorBaseHost *>(Ptr);
Ptr = detail::getSyclObjImpl(*AccBase).get();
} else if (AccTarget == access::target::local) {
detail::LocalAccessorBaseHost *LocalAccBase =
static_cast<detail::LocalAccessorBaseHost *>(Ptr);
Ptr = detail::getSyclObjImpl(*LocalAccBase).get();
}
}
processArg(Ptr, Kind, Size, I, IndexShift,
/*IsKernelCreatedFromSource=*/false, IsESIMD);
}
}

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
// TODO: Those functions are not used anymore, remove it in the next
// ABI-breaking window.
void handler::extractArgsAndReqsFromLambda(
char *LambdaPtr, const std::vector<detail::kernel_param_desc_t> &ParamDescs,
bool IsESIMD) {
Expand Down Expand Up @@ -1146,14 +1183,14 @@ void handler::extractArgsAndReqsFromLambda(
}
}

// TODO Unused, remove during ABI breaking window
void handler::extractArgsAndReqsFromLambda(
char *LambdaPtr, size_t KernelArgsNum,
const detail::kernel_param_desc_t *KernelArgs, bool IsESIMD) {
std::vector<detail::kernel_param_desc_t> ParamDescs(
KernelArgs, KernelArgs + KernelArgsNum);
extractArgsAndReqsFromLambda(LambdaPtr, ParamDescs, IsESIMD);
}
#endif // __INTEL_PREVIEW_BREAKING_CHANGES

// Calling methods of kernel_impl requires knowledge of class layout.
// As this is impossible in header, there's a function that calls necessary
Expand Down
1 change: 1 addition & 0 deletions sycl/test/abi/sycl_symbols_linux.dump
Original file line number Diff line number Diff line change
Expand Up @@ -3567,6 +3567,7 @@ _ZN4sycl3_V17handler27addLifetimeSharedPtrStorageESt10shared_ptrIKvE
_ZN4sycl3_V17handler27computeFallbackKernelBoundsEmm
_ZN4sycl3_V17handler28extractArgsAndReqsFromLambdaEPcRKSt6vectorINS0_6detail19kernel_param_desc_tESaIS5_EEb
_ZN4sycl3_V17handler28extractArgsAndReqsFromLambdaEPcmPKNS0_6detail19kernel_param_desc_tEb
_ZN4sycl3_V17handler28extractArgsAndReqsFromLambdaEPcPFNS0_6detail19kernel_param_desc_tEiEmb
_ZN4sycl3_V17handler28memcpyToHostOnlyDeviceGlobalEPKvS3_mbmm
_ZN4sycl3_V17handler28setArgsToAssociatedAccessorsEv
_ZN4sycl3_V17handler28setStateExplicitKernelBundleEv
Expand Down
1 change: 1 addition & 0 deletions sycl/test/abi/sycl_symbols_windows.dump
Original file line number Diff line number Diff line change
Expand Up @@ -3987,6 +3987,7 @@
?ext_oneapi_wait_external_semaphore@queue@_V1@sycl@@QEAA?AVevent@23@Uexternal_semaphore@experimental@oneapi@ext@23@_KV423@AEBUcode_location@detail@23@@Z
?extractArgsAndReqs@handler@_V1@sycl@@AEAAXXZ
?extractArgsAndReqsFromLambda@handler@_V1@sycl@@AEAAXPEADAEBV?$vector@Ukernel_param_desc_t@detail@_V1@sycl@@V?$allocator@Ukernel_param_desc_t@detail@_V1@sycl@@@std@@@std@@_N@Z
?extractArgsAndReqsFromLambda@handler@_V1@sycl@@AEAAXPEADP6A?AUkernel_param_desc_t@detail@23@H@Z_K_N@Z
?extractArgsAndReqsFromLambda@handler@_V1@sycl@@AEAAXPEAD_KPEBUkernel_param_desc_t@detail@23@_N@Z
?fill_impl@handler@_V1@sycl@@AEAAXPEAXPEBX_K2@Z
?finalize@handler@_V1@sycl@@AEAA?AVevent@23@XZ
Expand Down