What now follows is a callgrind and massif aided report of the performance of a drawn digit recognition MNIST-8 prediction, it as been set up according to pre-existing tests regarding auto-generated libraries using the new codegen features recently introduced in the Z-Ant codebase.
All tests were run on an Arch machne running a Ryzen 5 3600 and 16GB of DDR4 3000MHz RAM. No other CPU-Intensive processes were running throughout the duration of the tests, and each profiling task was run ten times using a shell script.
All callgrind runs were done by running the command valgrind --tool=callgrind --dump-instr=yes *bynary name*
.
Throughout the callgrind runs a function stood out as massive optimization targets: the ONNX convolution with bias functions, only two calls of it comprised ~35% of all the runtime of the application, it's a reasonable and expected result, as the convolution function is the main actor in calculating a prediction, it makes sense that the most performance demand arises from its calls.
With a whopping ~27K calls Tensor(f32).flatten_index
made up ~18% of all runtime,
All ten callgrind runs yielded the same results, plus/minus one or two memset calls that are probably related to under-the-hood linux workings, no notable variations were seen in any of the zig-related functions.
All massif runs were done by running the command valgrind --tool=massif --detail-freq=2 *binary name*
.
All ten massif runs yielded the same exact graph, heap consumption peaks happen exactly during the two calls to the ONNX Convolution function mentioned beforehand, and result in a peak consumption of 233.4 KB.
An additional ten runs were done without any external tools running and timed with the standard unix command time *binary name*
, yielding on average a 6.2ms run time, of which on average 0.9ms were due to system methods, and 5.3ms were due to the actual program.