Skip to content

Commit a0127e6

Browse files
committed
move
1 parent 1fe8955 commit a0127e6

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

compute/accelerator/README.md

-2
Original file line numberDiff line numberDiff line change
@@ -337,8 +337,6 @@ Also it's important to understand that knowing the Maximum Achievable Matmul TFL
337337

338338
And to conclude this section I'd like to repeat again that **the intention here is not to point fingers at which accelerator is more efficient than another, but to give a sense of what's what and how to navigate those theoretical specs and to help you understand when you need to continue optimizing your system and when to stop. So start with these notes and numbers as a starting point, then measure your own use case and use that latter measurement to gain the best outcome.**
339339

340-
Good related reads:
341-
- Horace's [Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data](https://www.thonking.ai/p/strangely-matrix-multiplications?utm_source=substack&publication_id=1781836&post_id=142508107) shows how benchmarking can be over-reporting if one uses a not normally distributed data and how power impacts performance.
342340

343341

344342
#### Not all accelerators are created equal

compute/accelerator/benchmarks/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,7 @@ There are a few excellent detailed write ups on how to perform CUDA benchmarks:
110110
2. [How to Benchmark Code on CUDA Devices?](https://salykova.github.io/sgemm-gpu#2-how-to-benchmark-code-on-cuda-devices) - this one is different from (1) in that it suggests to set both GPU and Memory clocks, whereas (1) only locks the GPU clock.
111111

112112
You can see these instructions applied in [mamf-finder.py](./mamf-finder.py) (other than clock locking)
113+
114+
Here are some excellent related reads:
115+
116+
- Horace's [Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data](https://www.thonking.ai/p/strangely-matrix-multiplications?utm_source=substack&publication_id=1781836&post_id=142508107) shows how benchmarking can be over-reporting if one uses a not normally distributed data and how power impacts performance.

0 commit comments

Comments
 (0)