-
Notifications
You must be signed in to change notification settings - Fork 234
Description
Hello,
I have a question regarding Gemmini performance improvement.
I conducted tests using gemmini-rocc-tests/imagenet/resnet50.c via Verilator simulation, and I used the elapsed cycle count output by the code as the performance metric.
First, I attempted to improve performance by scaling the baseline Gemmini configuration from a 16x16 mesh to a 64x64 mesh. I also increased the SPAD/ACC memory sizes to 1M and 256K, respectively. Since the systolic array size increased 16-fold (from 16x16 to 64x64), I expected a proportional (16x) performance improvement.
However, the simulation results showed that the cycle count was only reduced to about 1/3 compared to the baseline 16x16 configuration, meaning it was only a 3x performance improvement.
Given the Gemmini architecture, is this 3x improvement the expected level of performance gain one gets from just scaling the systolic array? Or are there additional modifications or factors that I am missing?
Any insights or advice on this would be greatly appreciated.
Thank you.