move

stas00 · stas00 · commit a0127e6c5367 · 2025-01-23T16:53:51.000-08:00
diff --git a/compute/accelerator/README.md b/compute/accelerator/README.md
@@ -337,8 +337,6 @@ Also it's important to understand that knowing the Maximum Achievable Matmul TFL
 
 And to conclude this section I'd like to repeat again that **the intention here is not to point fingers at which accelerator is more efficient than another, but to give a sense of what's what and how to navigate those theoretical specs and to help you understand when you need to continue optimizing your system and when to stop. So start with these notes and numbers as a starting point, then measure your own use case and use that latter measurement to gain the best outcome.**
 
-Good related reads:
-- Horace's [Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data](https://www.thonking.ai/p/strangely-matrix-multiplications?utm_source=substack&publication_id=1781836&post_id=142508107) shows how benchmarking can be over-reporting if one uses a not normally distributed data and how power impacts performance.
 
 
 #### Not all accelerators are created equal
diff --git a/compute/accelerator/benchmarks/README.md b/compute/accelerator/benchmarks/README.md
@@ -110,3 +110,7 @@ There are a few excellent detailed write ups on how to perform CUDA benchmarks:
 2. [How to Benchmark Code on CUDA Devices?](https://salykova.github.io/sgemm-gpu#2-how-to-benchmark-code-on-cuda-devices) - this one is different from (1) in that it suggests to set both GPU and Memory clocks, whereas (1) only locks the GPU clock.
 
 You can see these instructions applied in [mamf-finder.py](./mamf-finder.py) (other than clock locking)
+
+Here are some excellent related reads:
+
+- Horace's [Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data](https://www.thonking.ai/p/strangely-matrix-multiplications?utm_source=substack&publication_id=1781836&post_id=142508107) shows how benchmarking can be over-reporting if one uses a not normally distributed data and how power impacts performance.