Skip to content

Gemmini Performance Scaling: 16x16 vs 64x64 #397

@jhkim-voc

Description

@jhkim-voc

Hello,
I have a question regarding Gemmini performance improvement.

I conducted tests using gemmini-rocc-tests/imagenet/resnet50.c via Verilator simulation, and I used the elapsed cycle count output by the code as the performance metric.

First, I attempted to improve performance by scaling the baseline Gemmini configuration from a 16x16 mesh to a 64x64 mesh. I also increased the SPAD/ACC memory sizes to 1M and 256K, respectively. Since the systolic array size increased 16-fold (from 16x16 to 64x64), I expected a proportional (16x) performance improvement.

However, the simulation results showed that the cycle count was only reduced to about 1/3 compared to the baseline 16x16 configuration, meaning it was only a 3x performance improvement.

Given the Gemmini architecture, is this 3x improvement the expected level of performance gain one gets from just scaling the systolic array? Or are there additional modifications or factors that I am missing?

Any insights or advice on this would be greatly appreciated.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions