Skip to content

Conversation

@razdoburdin
Copy link
Contributor

@razdoburdin razdoburdin commented Jul 11, 2025

The PR introduces two improvements:

  1. In case histogram fits L1-cache, histogram accumulation is done in thread private memory. This allow to avoid costly random writing in global memory.
  2. The predictive model, being responsible for dispatching between buffer-base and atomic-base hist building was improved.

@razdoburdin razdoburdin marked this pull request as draft July 11, 2025 14:23
@trivialfis
Copy link
Member

For educational purposes, may I ask how you profile L1 cache hits with sycl?

@razdoburdin
Copy link
Contributor Author

For educational purposes, may I ask how you profile L1 cache hits with sycl?

VTune can help with it.


int eu_l1_size = 0;
int eu_registers_size = 0;
if (true) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to get info about L2 size, but L1 size should be hardcoded :(

hist_buffer_.Init(qu_, nbins);
size_t buffer_size = kBufferSize;
hist_buffer_.Reset(kBufferSize);
size_t buffer_size = 4 * qu_->get_device().get_info<::sycl::info::device::max_compute_units>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it worth removing kBufferSize constant from the header?
It looks like it is not used now.

Also it would be great to describe the meaning of the multiplier 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completely reweighted this logic with the new dispatcher. Now one estimate the required buffer size, that reduce memory consumption (critical for customer-class devices).

@razdoburdin razdoburdin marked this pull request as ready for review July 17, 2025 14:57
@razdoburdin
Copy link
Contributor Author

hi @trivialfis, PR is ready.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, overall looks good. I can merge it if you have the needed approvals from other reviewers. @Vika-F

*/
float th_block_per_eu = 1 + base_block_penalty - atomic_penalty / atomic_efficency;

/* The model will failed mostly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If penalties are close to each other, we can't tell if comparison gives us a valid result, since approximate model has some errors in penalty estimation.
I changed the comment to make this idea more clear.

Copy link

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is well documented and I am mostly Ok with the changes.
I have only 1 comment: is it possible to make GetHistBuildParameters function a constructor of HistBuildParameters struct?
Because now HistBuildParameters handling is a bit C-style which might lead to unexpected uninitialized objects usage.

@razdoburdin
Copy link
Contributor Author

razdoburdin commented Jul 18, 2025

The code is well documented and I am mostly Ok with the changes. I have only 1 comment: is it possible to make GetHistBuildParameters function a constructor of HistBuildParameters struct? Because now HistBuildParameters handling is a bit C-style which might lead to unexpected uninitialized objects usage.

I separated a class with device properties (like l2 size) into separate class, and moved GetHistBuildParameters to the constructor.

@Vika-F
Copy link

Vika-F commented Jul 18, 2025

Left some comments, overall looks good. I can merge it if you have the needed approvals from other reviewers. @Vika-F

@trivialfis I've approved. The code looks ready to merge.

@trivialfis trivialfis merged commit b11d452 into dmlc:master Jul 18, 2025
65 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants