Skip to content

Commit d5ceb0e

Browse files
committed
update benchmarks
1 parent 87fbaaf commit d5ceb0e

File tree

4 files changed

+3038
-8
lines changed

4 files changed

+3038
-8
lines changed

README.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ can only be scalars. BatchedBLAS.jl extends support for batched arrays by
55
`ger`, `syr`, and `spr` that work with arrays of AbstractFloats and Integers,
66
and scaling coefficients which can be scalars or Vectors.
77

8-
In addition to the type flexibility, there is a performance benefit for
9-
symmetric and packed symmetric matrices, where execution times for `syr`
10-
and `spr` are faster than the equivalent batched `gemm`. Benchmarks on an
11-
A100 follow. The dashed lines are for the transposed version of `gemv` and
12-
the upper-triangle versions of all other functions. Lower numbers are better.
8+
In addition to the type flexibility, there is a performance benefit for rank-1
9+
updates as execution times for `ger`, `syr`, and `spr` are faster than the
10+
equivalent batched `gemm` for the range of parameters tested. `dot` is also
11+
faster for small matrices. Benchmarks on an H100 follow. The dashed lines are
12+
for the transposed version of `gemv` and the upper-triangle versions of all
13+
other functions. Lower numbers are better.
1314

1415
![benchmarks](/bench/bench.svg)
1516

bench/Project.toml

+4-1
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,8 @@ JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
88
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
99
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1010
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
11-
NNlibCUDA = "a00861dc-f156-4864-bf3c-e6376f28a68d"
1211
SymmetricFormats = "a91e544d-b3d6-4431-ae28-0549b1291c16"
12+
13+
[compat]
14+
NNlib = "0.9"
15+
julia = "1.9"

bench/bench.svg

100755100644
+3,027-1
Loading

bench/runbench.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
using LinearAlgebra, BatchedBLAS, NNlib, SymmetricFormats, BenchmarkTools, DataFrames, Gadfly, JLD2
2-
using KernelAbstractions, CUDA, NNlibCUDA
2+
using KernelAbstractions, CUDA
33

44
macro belapsed_median(args...)
55
esc(:(time(median(@benchmark $(args...))) / 1e9))

0 commit comments

Comments
 (0)