Implement a draft for faster reordering #131

semi-h · 2024-12-03T19:55:53Z

Relevant to #78.

When benchmarking the transport equation I realised that reorders, including accumulations (sum_intox), are taking nearly 50% of the runtime while bandwidth based model estimates 24%. This is due to current implementation achieving only around 10-20% of the available bandwidth, but a verbose implementation describing the rule explicitly for the operation can go up to 60%.

Ultimately, fixing #40, #66, and #78 improved time-to-solution for a single 1024^3 transeq call on a single node of ARCHER2 from 8.28 seconds to 4.52 seconds. Without the #40 and #66 implemented, the runtime for the whole transeq was 8.28 and the reorders including accumulations were 3.68 seconds, accounting for the 44% of the total runtime. Fixing this was key to sustain ~66% of the peak theoretical bandwidth on ARCHER2, which ultimately resulted in timings to be in line with our expectations for performance based on the bandwidth model.

I think this verbose implementation describing the mapping explicitly for each mapping improves the runtime of reorders around 4x, which is quite important for our target speedup on CPUs.

Happy to investigate a better approach more like the current implementation which is very neat, but I think we need performance somewhere close to what this suggested implementation achieves.

Any thoughts?

Implement a draft for faster reordering.

d7aa63b

semi-h linked an issue Dec 5, 2024 that may be closed by this pull request

Investigate OMP reordering performance #78

Open

semi-h added this to the Benchmark OpenMP backend with TGV milestone Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a draft for faster reordering #131

Implement a draft for faster reordering #131

semi-h commented Dec 3, 2024

Implement a draft for faster reordering #131

Are you sure you want to change the base?

Implement a draft for faster reordering #131

Conversation

semi-h commented Dec 3, 2024