-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to reduce pool-block size upon micro-kernel behavior #478
base: master
Are you sure you want to change the base?
Conversation
- Currently, MC (or NC) size of of pool-block is extended by max(MR, NR) to cover possible invalid prefetch (speculative) of micro-kernels, typically for their convenience in the last KC-iteration. - This commit adds a macro BLIS_UKERNELS_NO_SPECULATIVE_PREFETCH to leave a possibility to any architecture which, in exchange of carefull prefetching in their micro-kernels to avoid invalid loads, to save some memory space.
@hominhquan I agree with the idea, but I think it might be simpler, clearer, and more flexible to instead create a macro that encodes how much to extend MC/NC when allocating, e.g. Aside: "prefetch" isn't the right word because the prefetch is harmless, it's the speculative load that is the problem. |
@devinamatthews thanks for your comment, and okay to a yet-to-be-named |
@devinamatthews Are you asking if I have coined terminology for this additional micropanel of space reserved in the packing blocks for A and B? Or are you asking if I've already parameterized this within the build system / cpp macros? |
Either, but especially the latter. |
I guess I don't have a name for it yet. (I never expected there would be any need or desire to isolate it for disabling.)
In principle, I sympathize with wanting to make this a cpp macro parameter that can be set in a header file in the subconfiguration directory. The problem I see with this is that the extra micropanel's size depends on the register blocksizes, which are set on a per-datatype basis. So you couldn't nicely capture the absolute size of padding for all datatypes with one number; you'd need four constants--one for each datatype. One could imagine, however, a single constant--probably a This would allow @hominhquan to zero out the constant, but allow others to increase it. The only limitations are that (1) they'd have to express the constant in units of micropanels, and (2) the constant would not vary across datatypes (although the number of bytes would naturally vary given that the register blocksizes vary). As an aside, I see some code that I could probably clean up and simplify. BLIS has not used right-side |
4 macros seems reasonable as ukernels for different datatypes may have different behavior. |
I'm really trying to move BLIS away from situations where we have to explicitly handle different datatypes in the code because it adds yet another place that requires tending-to and maintenance when adding support for new datatypes. |
Currently, MC (or NC) size of of pool-block is extended by max(MR, NR)
to cover possible invalid prefetch (speculative) of micro-kernels, typically
for their convenience in the last KC-iteration.
This commit adds a macro BLIS_UKERNELS_NO_SPECULATIVE_PREFETCH to leave a
possibility to any architecture which, in exchange of carefull prefetching in
their micro-kernels to avoid invalid loads, to save some memory space.