-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Sycl. Improve L1 cache locality for histogram building. #11555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For educational purposes, may I ask how you profile L1 cache hits with sycl? |
VTune can help with it. |
plugin/sycl/common/hist_util.cc
Outdated
|
|
||
| int eu_l1_size = 0; | ||
| int eu_registers_size = 0; | ||
| if (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality might help here:
https://github.khronos.org/SYCL_Reference/iface/device.html#get-info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to get info about L2 size, but L1 size should be hardcoded :(
plugin/sycl/tree/hist_updater.cc
Outdated
| hist_buffer_.Init(qu_, nbins); | ||
| size_t buffer_size = kBufferSize; | ||
| hist_buffer_.Reset(kBufferSize); | ||
| size_t buffer_size = 4 * qu_->get_device().get_info<::sycl::info::device::max_compute_units>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it worth removing kBufferSize constant from the header?
It looks like it is not used now.
Also it would be great to describe the meaning of the multiplier 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have completely reweighted this logic with the new dispatcher. Now one estimate the required buffer size, that reduce memory consumption (critical for customer-class devices).
|
hi @trivialfis, PR is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, overall looks good. I can merge it if you have the needed approvals from other reviewers. @Vika-F
plugin/sycl/tree/hist_dispatcher.h
Outdated
| */ | ||
| float th_block_per_eu = 1 + base_block_penalty - atomic_penalty / atomic_efficency; | ||
|
|
||
| /* The model will failed mostly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If penalties are close to each other, we can't tell if comparison gives us a valid result, since approximate model has some errors in penalty estimation.
I changed the comment to make this idea more clear.
Vika-F
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is well documented and I am mostly Ok with the changes.
I have only 1 comment: is it possible to make GetHistBuildParameters function a constructor of HistBuildParameters struct?
Because now HistBuildParameters handling is a bit C-style which might lead to unexpected uninitialized objects usage.
I separated a class with device properties (like l2 size) into separate class, and moved |
@trivialfis I've approved. The code looks ready to merge. |
The PR introduces two improvements: