Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefetch to core #136

Open
HDembinski opened this issue Dec 23, 2022 · 2 comments
Open

Add prefetch to core #136

HDembinski opened this issue Dec 23, 2022 · 2 comments

Comments

@HDembinski
Copy link
Contributor

HDembinski commented Dec 23, 2022

We are currently implementing a new algorithm in Boost.Histogram which is supposed to profit from cacheline prefetching. The prefetch instruction is a builtin, so we need code to get a platform-independent Boost command. I discovered that Boost.Context already has such an implementation in its details.

https://www.boost.org/doc/libs/1_81_0/boost/context/detail/prefetch.hpp

I propose to move this code to Core, so that all libraries can profit from prefetching. Currently, context and fiber use this code, and it is currently added to Boost.Histogram.

@Lastique
Copy link
Member

The code in Boost.Context is specific to x86 (and not entirely correct, BTW - prefetch_range should align prefetches to cache line boundaries, otherwise it might not prefetch the last cache line). I think, with recent CPUs explicit prefetching is mostly useless as it doesn't provide tangible benefits compared to the implicit hardware prefetching. Did you actually measure the effect of explicit prefetching with your code?

Also, I'm not sure if explicit prefetching is a thing in other architectures besides x86.

@HDembinski
Copy link
Contributor Author

HDembinski commented Dec 24, 2022

I did measure for my current project, and I wrote above that it does not accelerate my access pattern on my computer. However, if I had an older CPU with a less smart hardware prefetcher, it might.

The point of this is to give a hint to the CPU which data will be needed next, when you have a more complex access pattern than forward steps with a fixed constant.

Since this is an optimization, it is fine if the implementation evaluates to nothing on platforms that do not provide an intrinsic.

Also, to clarify, I only want the basic prefetch command in core, not prefetch_range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants