Accuracy question on cublaslt gemm benchmark #382
Unanswered
TerrenceZhangX
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Accel-sim Devlopers,
I have a question regarding the accuracy on cublaslt gemm benchmark on GV100 config. The accuracy appears to be worse than the cutlass gemm benchmark result showed in the paper. I checked the tuner and it's mainly about tuning based on microbenchmarks.
I'm wondering if you're aware of this kind of behaviour (same workload, different implementation -> various relative error) and have any guidance to better tune the accurcy. Thanks!
Attach the details here:
I tested gemm with shape 128x4096x4096.
Per cutlass benchmark, it's profiled 203us vs. simulated 163us, with a <20% relative error.
However, with cublaslt benchmark it's profiled 148us vs. simulated 85us, the relative error becomes ~40%.
This is the cublaslt benchmark I'm using: https://github.com/microsoft/superbenchmark/tree/main/superbench/benchmarks/micro_benchmarks/cublaslt_gemm, using the command ./cublaslt_gemm -b 1 -m 128 -n 4096 -k 4096 -t fp16, returning the result as
M N K B ElapsedTime(us) AchievedTFLOPS
128 4096 4096 1 148.479355 28.922829
Beta Was this translation helpful? Give feedback.
All reactions