#

tiled-matmul

Here is 1 public repository matching this topic...

Harkiran11 / cuda-matmul-engine

High-performance CUDA matrix multiplication kernels - shared memory tiling, register blocking, Roofline Model analysis. Benchmarked against cuBLAS.

c-plus-plus machine-learning deep-learning hpc parallel-computing cuda nvidia matrix-multiplication high-performance-computing cuda-kernels gpu-computing gpu-optimization roofline-model memory-coalescing tiled-matmul

Updated Apr 30, 2026
Cuda

Improve this page

Add a description, image, and links to the tiled-matmul topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tiled-matmul topic, visit your repo's landing page and select "manage topics."