Skip to content

Performance of kokkos serial backend vs. plain serial code #297

@markdewing

Description

@markdewing

The time from the kokkos serial backend (kokkos --serial) is slower for one thread than the standalone serial code (serial).

Looking at profiles, the function kernel_connect ( in plugin-PixelTriplets) takes significantly more time in the kokkos version. From the instructions retired, it is clearly performing more operations in the kokkos version.

The loops do not perform their iterations in the same order.

<outer loop from kokkos>
  <kernel_connect function body>
     for (int idx = firstCellIndex, nt = (nCells()); idx < nt; idx += leagueSize * blockDim) {
     ...
       for (int j = first; j < numberOfPossibleNeighbors; j += stride) {

The serial version has no outer loop. However, printing the values for idx and j shows the same values get accessed, just in a different order between the versions.

It looks like (based on instructions retired), the kokkos version is doing more work (by a factor of 2x or more), in routines like areAlignedRZ. But based on printing out how many times it's called, they should be the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions