Accuracy question on cublaslt gemm benchmark #382

TerrenceZhangX · 2025-02-21T11:00:47Z

TerrenceZhangX
Feb 21, 2025

Hello Accel-sim Devlopers,

I have a question regarding the accuracy on cublaslt gemm benchmark on GV100 config. The accuracy appears to be worse than the cutlass gemm benchmark result showed in the paper. I checked the tuner and it's mainly about tuning based on microbenchmarks.

I'm wondering if you're aware of this kind of behaviour (same workload, different implementation -> various relative error) and have any guidance to better tune the accurcy. Thanks!

Attach the details here:
I tested gemm with shape 128x4096x4096.

Per cutlass benchmark, it's profiled 203us vs. simulated 163us, with a <20% relative error.
However, with cublaslt benchmark it's profiled 148us vs. simulated 85us, the relative error becomes ~40%.

This is the cublaslt benchmark I'm using: https://github.com/microsoft/superbenchmark/tree/main/superbench/benchmarks/micro_benchmarks/cublaslt_gemm, using the command ./cublaslt_gemm -b 1 -m 128 -n 4096 -k 4096 -t fp16, returning the result as

M N K B ElapsedTime(us) AchievedTFLOPS
128 4096 4096 1 148.479355 28.922829

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accuracy question on cublaslt gemm benchmark #382

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Accuracy question on cublaslt gemm benchmark #382

Uh oh!

TerrenceZhangX Feb 21, 2025

Replies: 0 comments

TerrenceZhangX
Feb 21, 2025