Skip to content

[TEST PR] ignore #2645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

EuphoricThinking
Copy link
Contributor

No description provided.

@EuphoricThinking EuphoricThinking requested a review from a team as a code owner January 30, 2025 14:55
@github-actions github-actions bot added the ci/cd Continuous integration/devliery label Jan 30, 2025
Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13055486959

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13055486959
Job status: success. Test status: success.

Summary

Total 148 benchmarks in mean.
Geomean 89.971%.
Improved 24 Regressed 38 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 96.139%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel in order 16.547000 μs 16.785 μs 101.44% 1.44% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.123000 μs 2.149 μs 101.22% 1.22% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.259000 μs 21.495 μs 101.11% 1.11% .
api_overhead_benchmark_ur SubmitKernel out of order 15.749000 μs 15.866 μs 100.74% 0.74% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104663.000000 instr 104663.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110006.000000 instr 110006.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 123190.000 instr 123166.000000 instr 99.98% -0.02% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.608 μs 23.506000 μs 99.57% -0.43% .
api_overhead_benchmark_sycl SubmitKernel in order 24.521 μs 24.407000 μs 99.54% -0.46% .
api_overhead_benchmark_l0 SubmitKernel in order 11.632 μs 11.395000 μs 97.96% -2.04% .
api_overhead_benchmark_l0 SubmitKernel out of order 11.680 μs 11.369000 μs 97.34% -2.66% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 2.652 μs 1.673000 μs 63.08% -36.92% .
Relative perf in group memory (4): 114.033%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 134.861000 μs 219.832 μs 163.01% 63.01% .
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.242000 GB/s 3.070 GB/s 105.60% 5.60% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 254.750 μs 252.914000 μs 99.28% -0.72% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.963 μs 5.900000 μs 98.94% -1.06% .
Relative perf in group miscellaneous (1): 99.795%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 859.782 bw GB/s 858.023000 bw GB/s 99.80% -0.20% .
Relative perf in group multithread (10): 100.557%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 40929.325000 μs 42602.254 μs 104.09% 4.09% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7475.640000 μs 7766.797 μs 103.89% 3.89% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 26036.929000 μs 27030.035 μs 103.81% 3.81% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8794.927000 μs 8883.578 μs 101.01% 1.01% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6907.362 μs 6896.127000 μs 99.84% -0.16% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1203.921 μs 1199.669000 μs 99.65% -0.35% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 47195.783 μs 46811.855000 μs 99.19% -0.81% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17377.891 μs 17165.065000 μs 98.78% -1.22% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 114673.544 μs 112408.658000 μs 98.02% -1.98% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2098.660 μs 2047.766000 μs 97.57% -2.43% .
Relative perf in group graph (10): 108.656%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 4039.886000 μs 5631.730 μs 139.40% 39.40% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 4052.131000 μs 5621.320 μs 138.73% 38.73% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 46811.230000 μs 56454.921 μs 120.60% 20.60% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 62.355000 μs 62.493 μs 100.22% 0.22% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71750.799 μs 71746.038000 μs 99.99% -0.01% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353400.657 μs 353349.563000 μs 99.99% -0.01% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353387.985 μs 353086.695000 μs 99.91% -0.09% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72724.829 μs 72583.103000 μs 99.81% -0.19% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 682.492 μs 677.203000 μs 99.23% -0.77% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 55.700 μs 55.253000 μs 99.20% -0.80% .
Relative perf in group Velocity-Bench (9): 99.981%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Bitcracker 35.461400 s 38.359 s 108.17% 8.17% .
Velocity-Bench Hashtable 380.134197 M keys/sec 363.340 M keys/sec 104.62% 4.62% .
Velocity-Bench CudaSift 201.277000 ms 203.947 ms 101.33% 1.33% .
Velocity-Bench Sobel Filter 596.660000 ms 603.076 ms 101.08% 1.08% .
Velocity-Bench QuickSilver 117.170000 MMS/CTT 116.460 MMS/CTT 100.61% 0.61% .
Velocity-Bench dl-cifar 23.601100 s 23.630 s 100.12% 0.12% .
Velocity-Bench dl-mnist 2.730 s 2.710000 s 99.27% -0.73% .
Velocity-Bench Easywave 229.000 ms 227.000000 ms 99.13% -0.87% .
Velocity-Bench svm 0.156 s 0.135900 s 86.89% -13.11% .
Relative perf in group Runtime (8): 98.026%
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 274.365000 ms 276.461 ms 100.76% 0.76% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 276.091 ms 275.173000 ms 99.67% -0.33% .
Runtime_IndependentDAGTaskThroughput_SingleTask 263.435 ms 259.444000 ms 98.49% -1.51% .
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1743.424 ms 1710.439000 ms 98.11% -1.89% .
Runtime_DAGTaskThroughput_SingleTask 1685.683 ms 1648.643000 ms 97.80% -2.20% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 1712.514 ms 1673.462000 ms 97.72% -2.28% .
Runtime_DAGTaskThroughput_BasicParallelFor 1766.802 ms 1704.436000 ms 96.47% -3.53% .
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 287.813 ms 274.274000 ms 95.30% -4.70% .
Relative perf in group MicroBench (14): 95.682%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 4.738000 ms 4.940 ms 104.26% 4.26% .
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 4.789000 ms 4.909 ms 102.51% 2.51% .
MicroBench_LocalMem_int32_4096 29.855000 ms 29.862 ms 100.02% 0.02% .
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.587 ms 4.585000 ms 99.96% -0.04% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 617.867 ms 617.442000 ms 99.93% -0.07% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 617.882 ms 617.437000 ms 99.93% -0.07% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.277 ms 616.784000 ms 99.92% -0.08% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.554 ms 616.834000 ms 99.88% -0.12% .
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 4.464 ms 4.456000 ms 99.82% -0.18% .
MicroBench_LocalMem_fp32_4096 30.015 ms 29.902000 ms 99.62% -0.38% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.400 ms 4.376000 ms 99.45% -0.55% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.547 ms 4.276000 ms 94.04% -5.96% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.133 ms 4.716000 ms 91.88% -8.12% .
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 7.641 ms 4.526000 ms 59.23% -40.77% .
Relative perf in group Pattern (10): 101.767%
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_Hierarchical_int32 13.403000 ms 16.339 ms 121.91% 21.91% .
Pattern_Reduction_NDRange_int32 16.264000 ms 16.339 ms 100.46% 0.46% .
Pattern_SegmentedReduction_Hierarchical_fp32 11.593 ms 11.587000 ms 99.95% -0.05% .
Pattern_SegmentedReduction_Hierarchical_int64 11.795 ms 11.782000 ms 99.89% -0.11% .
Pattern_SegmentedReduction_Hierarchical_int16 11.819 ms 11.796000 ms 99.81% -0.19% .
Pattern_SegmentedReduction_Hierarchical_int32 11.613 ms 11.588000 ms 99.78% -0.22% .
Pattern_SegmentedReduction_NDRange_int64 2.344 ms 2.337000 ms 99.70% -0.30% .
Pattern_SegmentedReduction_NDRange_int32 2.172 ms 2.165000 ms 99.68% -0.32% .
Pattern_SegmentedReduction_NDRange_fp32 2.176 ms 2.168000 ms 99.63% -0.37% .
Pattern_SegmentedReduction_NDRange_int16 2.292 ms 2.265000 ms 98.82% -1.18% .
Relative perf in group ScalarProduct (6): 99.760%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_Hierarchical_int32 10.532000 ms 10.541 ms 100.09% 0.09% .
ScalarProduct_Hierarchical_fp32 10.166000 ms 10.167 ms 100.01% 0.01% .
ScalarProduct_NDRange_int32 3.770 ms 3.765000 ms 99.87% -0.13% .
ScalarProduct_NDRange_fp32 3.758 ms 3.749000 ms 99.76% -0.24% .
ScalarProduct_Hierarchical_int64 11.535 ms 11.490000 ms 99.61% -0.39% .
ScalarProduct_NDRange_int64 5.467 ms 5.425000 ms 99.23% -0.77% .
Relative perf in group USM (7): 90.095%
Benchmark This PR baseline Relative perf Change -
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.201000 ms 1.258 ms 104.75% 4.75% .
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.043000 ms 1.087 ms 104.22% 4.22% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.865000 ms 1.893 ms 101.50% 1.50% .
USM_Allocation_latency_fp32_host 37.971 ms 37.623000 ms 99.08% -0.92% .
USM_Allocation_latency_fp32_device 0.067 ms 0.065000 ms 97.01% -2.99% .
USM_Allocation_latency_fp32_shared 0.069 ms 0.062000 ms 89.86% -10.14% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 3.450 ms 1.737000 ms 50.35% -49.65% .
Relative perf in group VectorAddition (3): 100.179%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 1.463000 ms 1.477 ms 100.96% 0.96% .
VectorAddition_fp32 1.470000 ms 1.480 ms 100.68% 0.68% .
VectorAddition_int64 3.122 ms 3.088000 ms 98.91% -1.09% .
Relative perf in group Polybench (3): 99.051%
Benchmark This PR baseline Relative perf Change -
Polybench_3mm 1.486 ms 1.477000 ms 99.39% -0.61% .
Polybench_Atax 6.467 ms 6.402000 ms 98.99% -1.01% .
Polybench_2mm 1.052 ms 1.039000 ms 98.76% -1.24% .
Relative perf in group Kmeans (1): 99.654%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 14.160 ms 14.111000 ms 99.65% -0.35% .
Relative perf in group LinearRegressionCoeff (1): 102.215%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 862.805000 ms 881.915 ms 102.21% 2.21% .
Relative perf in group MolecularDynamics (1): 53.571%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.056 ms 0.030000 ms 53.57% -46.43% .
Relative perf in group llama.cpp (6): 100.104%
Benchmark This PR baseline Relative perf Change -
llama.cpp Text Generation Batched 512 63.033973 token/s 62.789 token/s 100.39% 0.39% .
llama.cpp Text Generation Batched 256 63.013882 token/s 62.777 token/s 100.38% 0.38% .
llama.cpp Text Generation Batched 128 62.969580 token/s 62.791 token/s 100.28% 0.28% .
llama.cpp Prompt Processing Batched 128 832.145879 token/s 830.097 token/s 100.25% 0.25% .
llama.cpp Prompt Processing Batched 256 878.241 token/s 878.291089 token/s 99.99% -0.01% .
llama.cpp Prompt Processing Batched 512 432.819 token/s 435.723514 token/s 99.33% -0.67% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7): 176.175%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 136.944000 ns 2688.530 ns 1963.23% 1863.23% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2078.040000 ns 2113.560 ns 101.71% 1.71% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3205.220 ns 3097.620000 ns 96.64% -3.36% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 306.302 ns 287.722000 ns 93.93% -6.07% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2631.780 ns 2464.050000 ns 93.63% -6.37% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4701.140000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3663.010000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7): 143.213%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 104.170000 ns 705.635 ns 677.39% 577.39% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 211.610 ns 208.759000 ns 98.65% -1.35% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 196.961 ns 191.313000 ns 97.13% -2.87% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 282.053 ns 272.237000 ns 96.52% -3.48% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 726.325 ns 698.410000 ns 96.16% -3.84% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 507.879000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 118.136000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7): 155.774%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 120.540000 ns 1226.080 ns 1017.16% 917.16% +++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1852.650000 ns 2038.360 ns 110.02% 10.02% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3476.710 ns 3338.690000 ns 96.03% -3.97% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 277.559 ns 261.553000 ns 94.23% -5.77% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1407.250 ns 1274.570000 ns 90.57% -9.43% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4636.620000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3705.340000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7): 120.900%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 241.115000 ns 707.467 ns 293.41% 193.41% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 720.411 ns 706.907000 ns 98.13% -1.87% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 193.486 ns 189.545000 ns 97.96% -2.04% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 204.454 ns 196.551000 ns 96.13% -3.87% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 326.361 ns 310.903000 ns 95.26% -4.74% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 529.648000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 118.489000 ns -
Relative perf in group alloc/min (8): 81.507%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 546.234000 ns 832.725 ns 152.45% 52.45% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 962.529 ns 958.800000 ns 99.61% -0.39% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 177.724 ns 174.753000 ns 98.33% -1.67% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 812.237 ns 797.092000 ns 98.14% -1.86% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1032.840 ns 965.779000 ns 93.51% -6.49% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 827.745 ns 177.130000 ns 21.40% -78.60% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4285.610000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 350.082000 ns -
Relative perf in group multiple (22): 26.627%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 14944.400000 ns 16418.600 ns 109.86% 9.86% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32440.600000 ns 33153.600 ns 102.20% 2.20% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4229.770000 ns 4283.690 ns 101.27% 1.27% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75433.700000 ns 75451.700 ns 100.02% 0.02% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 139261.000 ns 138360.000000 ns 99.35% -0.65% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25830.100 ns 25525.500000 ns 98.82% -1.18% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31342.800 ns 30910.300000 ns 98.62% -1.38% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1196070.000 ns 1174970.000000 ns 98.24% -1.76% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 150500.000 ns 146423.000000 ns 97.29% -2.71% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1202440.000 ns 1162100.000000 ns 96.65% -3.35% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 169941.000 ns 162279.000000 ns 95.49% -4.51% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 48117.700 ns 41438.000000 ns 86.12% -13.88% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10832400.000 ns 140162.000000 ns 1.29% -98.71% -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2732150.000 ns 30121.800000 ns 1.10% -98.90% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 7935540.000 ns 27477.700000 ns 0.35% -99.65% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2723490.000 ns 4208.520000 ns 0.15% -99.85% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1763760.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 218003.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 524139.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24479.700000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 632496.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 60170.100000 ns -

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

VectorAddition_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Polybench_2mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/2mm.csv --size=512

Polybench_3mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/3mm.csv --size=512

Polybench_Atax

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192

Kmeans_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

MolecularDynamics

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13055887853

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13055887853
Job status: success. Test status: success.

Summary

Total 148 benchmarks in mean.
Geomean 90.065%.
Improved 26 Regressed 37 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.961%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.073000 μs 21.495 μs 102.00% 2.00% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.192000 μs 23.506 μs 101.35% 1.35% .
api_overhead_benchmark_ur SubmitKernel in order 16.646000 μs 16.785 μs 100.84% 0.84% .
api_overhead_benchmark_ur SubmitKernel out of order 15.766000 μs 15.866 μs 100.63% 0.63% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.666000 μs 1.673 μs 100.42% 0.42% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 122876.000000 instr 123166.000 instr 100.24% 0.24% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104663.000000 instr 104663.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110006.000000 instr 110006.000 instr 100.00% 0.00% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.158 μs 2.149000 μs 99.58% -0.42% .
api_overhead_benchmark_sycl SubmitKernel in order 24.558 μs 24.407000 μs 99.39% -0.61% .
api_overhead_benchmark_l0 SubmitKernel in order 11.478 μs 11.395000 μs 99.28% -0.72% .
api_overhead_benchmark_l0 SubmitKernel out of order 11.851 μs 11.369000 μs 95.93% -4.07% .
Relative perf in group memory (4): 114.451%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 135.850000 μs 219.832 μs 161.82% 61.82% .
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.184000 GB/s 3.070 GB/s 103.71% 3.71% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.776000 μs 5.900 μs 102.15% 2.15% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 252.684000 μs 252.914 μs 100.09% 0.09% .
Relative perf in group miscellaneous (1): 99.727%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 860.370 bw GB/s 858.023000 bw GB/s 99.73% -0.27% .
Relative perf in group multithread (10): 100.556%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 25956.135000 μs 27030.035 μs 104.14% 4.14% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 40999.622000 μs 42602.254 μs 103.91% 3.91% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7489.297000 μs 7766.797 μs 103.71% 3.71% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 111874.103000 μs 112408.658 μs 100.48% 0.48% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8905.592 μs 8883.578000 μs 99.75% -0.25% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6927.748 μs 6896.127000 μs 99.54% -0.46% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1211.929 μs 1199.669000 μs 98.99% -1.01% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 47402.251 μs 46811.855000 μs 98.75% -1.25% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2081.600 μs 2047.766000 μs 98.37% -1.63% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17485.423 μs 17165.065000 μs 98.17% -1.83% .
Relative perf in group graph (10): 107.589%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 4070.135000 μs 5621.320 μs 138.11% 38.11% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 4080.166000 μs 5631.730 μs 138.03% 38.03% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 49630.523000 μs 56454.921 μs 113.75% 13.75% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71762.825 μs 71746.038000 μs 99.98% -0.02% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353323.951 μs 353086.695000 μs 99.93% -0.07% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353679.319 μs 353349.563000 μs 99.91% -0.09% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72663.276 μs 72583.103000 μs 99.89% -0.11% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 681.692 μs 677.203000 μs 99.34% -0.66% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 55.834 μs 55.253000 μs 98.96% -1.04% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 63.914 μs 62.493000 μs 97.78% -2.22% .
Relative perf in group Velocity-Bench (9): 99.016%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Bitcracker 35.470800 s 38.359 s 108.14% 8.14% .
Velocity-Bench QuickSilver 118.570000 MMS/CTT 116.460 MMS/CTT 101.81% 1.81% .
Velocity-Bench CudaSift 201.729000 ms 203.947 ms 101.10% 1.10% .
Velocity-Bench Easywave 227.000000 ms 227.000 ms 100.00% 0.00% .
Velocity-Bench dl-cifar 23.680 s 23.630300 s 99.79% -0.21% .
Velocity-Bench Sobel Filter 606.368 ms 603.076000 ms 99.46% -0.54% .
Velocity-Bench Hashtable 361.069 M keys/sec 363.339623 M keys/sec 99.37% -0.63% .
Velocity-Bench dl-mnist 2.740 s 2.710000 s 98.91% -1.09% .
Velocity-Bench svm 0.161 s 0.135900 s 84.25% -15.75% .
Relative perf in group Runtime (8): 95.078%
Benchmark This PR baseline Relative perf Change -
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1750.403 ms 1710.439000 ms 97.72% -2.28% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 1725.472 ms 1673.462000 ms 96.99% -3.01% .
Runtime_DAGTaskThroughput_SingleTask 1705.268 ms 1648.643000 ms 96.68% -3.32% .
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 284.474 ms 274.274000 ms 96.41% -3.59% .
Runtime_IndependentDAGTaskThroughput_SingleTask 271.448 ms 259.444000 ms 95.58% -4.42% .
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 290.329 ms 276.461000 ms 95.22% -4.78% .
Runtime_DAGTaskThroughput_BasicParallelFor 1791.302 ms 1704.436000 ms 95.15% -4.85% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 315.245 ms 275.173000 ms 87.29% -12.71% .
Relative perf in group MicroBench (14): 92.622%
Benchmark This PR baseline Relative perf Change -
MicroBench_LocalMem_fp32_4096 29.825000 ms 29.902 ms 100.26% 0.26% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.172 ms 617.442000 ms 99.88% -0.12% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.586 ms 616.834000 ms 99.88% -0.12% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.214 ms 617.437000 ms 99.87% -0.13% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.585 ms 616.784000 ms 99.87% -0.13% .
MicroBench_LocalMem_int32_4096 29.931 ms 29.862000 ms 99.77% -0.23% .
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.092 ms 4.909000 ms 96.41% -3.59% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.229 ms 4.940000 ms 94.47% -5.53% .
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.934 ms 4.585000 ms 92.93% -7.07% .
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 4.841 ms 4.456000 ms 92.05% -7.95% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.155 ms 4.716000 ms 91.48% -8.52% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.821 ms 4.376000 ms 90.77% -9.23% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.807 ms 4.276000 ms 88.95% -11.05% .
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 7.580 ms 4.526000 ms 59.71% -40.29% .
Relative perf in group Pattern (10): 99.939%
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_Hierarchical_int32 16.116000 ms 16.339 ms 101.38% 1.38% .
Pattern_Reduction_NDRange_int32 16.281000 ms 16.339 ms 100.36% 0.36% .
Pattern_SegmentedReduction_Hierarchical_int64 11.784 ms 11.782000 ms 99.98% -0.02% .
Pattern_SegmentedReduction_Hierarchical_fp32 11.602 ms 11.587000 ms 99.87% -0.13% .
Pattern_SegmentedReduction_NDRange_int64 2.341 ms 2.337000 ms 99.83% -0.17% .
Pattern_SegmentedReduction_Hierarchical_int32 11.613 ms 11.588000 ms 99.78% -0.22% .
Pattern_SegmentedReduction_NDRange_fp32 2.173 ms 2.168000 ms 99.77% -0.23% .
Pattern_SegmentedReduction_NDRange_int32 2.170 ms 2.165000 ms 99.77% -0.23% .
Pattern_SegmentedReduction_Hierarchical_int16 11.825 ms 11.796000 ms 99.75% -0.25% .
Pattern_SegmentedReduction_NDRange_int16 2.290 ms 2.265000 ms 98.91% -1.09% .
Relative perf in group ScalarProduct (6): 99.670%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 3.760000 ms 3.765 ms 100.13% 0.13% .
ScalarProduct_Hierarchical_fp32 10.157000 ms 10.167 ms 100.10% 0.10% .
ScalarProduct_Hierarchical_int32 10.542 ms 10.541000 ms 99.99% -0.01% .
ScalarProduct_Hierarchical_int64 11.517 ms 11.490000 ms 99.77% -0.23% .
ScalarProduct_NDRange_fp32 3.766 ms 3.749000 ms 99.55% -0.45% .
ScalarProduct_NDRange_int64 5.508 ms 5.425000 ms 98.49% -1.51% .
Relative perf in group USM (7): 87.700%
Benchmark This PR baseline Relative perf Change -
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.231000 ms 1.258 ms 102.19% 2.19% .
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.077000 ms 1.087 ms 100.93% 0.93% .
USM_Allocation_latency_fp32_host 37.865 ms 37.623000 ms 99.36% -0.64% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.917 ms 1.893000 ms 98.75% -1.25% .
USM_Allocation_latency_fp32_device 0.071 ms 0.065000 ms 91.55% -8.45% .
USM_Allocation_latency_fp32_shared 0.074 ms 0.062000 ms 83.78% -16.22% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 3.379 ms 1.737000 ms 51.41% -48.59% .
Relative perf in group VectorAddition (3): 100.605%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 1.460000 ms 1.477 ms 101.16% 1.16% .
VectorAddition_int64 3.070000 ms 3.088 ms 100.59% 0.59% .
VectorAddition_fp32 1.479000 ms 1.480 ms 100.07% 0.07% .
Relative perf in group Polybench (3): 98.868%
Benchmark This PR baseline Relative perf Change -
Polybench_3mm 1.491 ms 1.477000 ms 99.06% -0.94% .
Polybench_Atax 6.475 ms 6.402000 ms 98.87% -1.13% .
Polybench_2mm 1.053 ms 1.039000 ms 98.67% -1.33% .
Relative perf in group Kmeans (1): 99.993%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 14.112 ms 14.111000 ms 99.99% -0.01% .
Relative perf in group LinearRegressionCoeff (1): 98.403%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 896.225 ms 881.915000 ms 98.40% -1.60% .
Relative perf in group MolecularDynamics (1): 50.000%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.060 ms 0.030000 ms 50.00% -50.00% .
Relative perf in group llama.cpp (6): 99.564%
Benchmark This PR baseline Relative perf Change -
llama.cpp Text Generation Batched 256 63.031606 token/s 62.777 token/s 100.41% 0.41% .
llama.cpp Text Generation Batched 512 63.024801 token/s 62.789 token/s 100.38% 0.38% .
llama.cpp Text Generation Batched 128 62.993262 token/s 62.791 token/s 100.32% 0.32% .
llama.cpp Prompt Processing Batched 512 432.189 token/s 435.723514 token/s 99.19% -0.81% .
llama.cpp Prompt Processing Batched 256 867.135 token/s 878.291089 token/s 98.73% -1.27% .
llama.cpp Prompt Processing Batched 128 816.653 token/s 830.097430 token/s 98.38% -1.62% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7): 187.944%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 126.330000 ns 2688.530 ns 2128.18% 2028.18% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 1970.000000 ns 2113.560 ns 107.29% 7.29% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2394.250000 ns 2464.050 ns 102.92% 2.92% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3043.580000 ns 3097.620 ns 101.78% 1.78% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 293.438 ns 287.722000 ns 98.05% -1.95% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4632.600000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3747.240000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7): 144.481%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 102.333000 ns 705.635 ns 689.55% 589.55% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 276.470 ns 272.237000 ns 98.47% -1.53% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 711.427 ns 698.410000 ns 98.17% -1.83% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 212.825 ns 208.759000 ns 98.09% -1.91% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 198.680 ns 191.313000 ns 96.29% -3.71% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 491.472000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 120.857000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7): 166.841%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 123.843000 ns 1226.080 ns 990.03% 890.03% ++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1662.350000 ns 2038.360 ns 122.62% 22.62% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3152.380000 ns 3338.690 ns 105.91% 5.91% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1240.040000 ns 1274.570 ns 102.78% 2.78% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 267.369 ns 261.553000 ns 97.82% -2.18% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4538.790000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3568.870000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7): 126.217%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 197.885000 ns 707.467 ns 357.51% 257.51% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 303.412000 ns 310.903 ns 102.47% 2.47% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 195.379 ns 189.545000 ns 97.01% -2.99% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 739.729 ns 706.907000 ns 95.56% -4.44% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 208.399 ns 196.551000 ns 94.31% -5.69% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 503.244000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 123.388000 ns -
Relative perf in group alloc/min (8): 81.758%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 637.153000 ns 832.725 ns 130.69% 30.69% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 954.465000 ns 958.800 ns 100.45% 0.45% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 176.066 ns 174.753000 ns 99.25% -0.75% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 999.902 ns 965.779000 ns 96.59% -3.41% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 892.006 ns 797.092000 ns 89.36% -10.64% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 667.010 ns 177.130000 ns 26.56% -73.44% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4173.800000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 395.250000 ns -
Relative perf in group multiple (22): 26.824%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15177.600000 ns 16418.600 ns 108.18% 8.18% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 138901.000000 ns 146423.000 ns 105.42% 5.42% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1151840.000000 ns 1174970.000 ns 102.01% 2.01% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 162815.000 ns 162279.000000 ns 99.67% -0.33% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75741.900 ns 75451.700000 ns 99.62% -0.38% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4310.020 ns 4283.690000 ns 99.39% -0.61% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1171730.000 ns 1162100.000000 ns 99.18% -0.82% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 33449.000 ns 33153.600000 ns 99.12% -0.88% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31315.700 ns 30910.300000 ns 98.71% -1.29% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25911.300 ns 25525.500000 ns 98.51% -1.49% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 142900.000 ns 138360.000000 ns 96.82% -3.18% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 46556.300 ns 41438.000000 ns 89.01% -10.99% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10794200.000 ns 140162.000000 ns 1.30% -98.70% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2716370.000 ns 30121.800000 ns 1.11% -98.89% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 8007430.000 ns 27477.700000 ns 0.34% -99.66% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2737030.000 ns 4208.520000 ns 0.15% -99.85% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1739660.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 206854.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 516154.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24325.300000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 648229.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 62128.900000 ns -

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

VectorAddition_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Polybench_2mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/2mm.csv --size=512

Polybench_3mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/3mm.csv --size=512

Polybench_Atax

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192

Kmeans_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

MolecularDynamics

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=2 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057141939

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057141939
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 70.728%.
Improved 7 Regressed 23 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7): 177.072%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 117.389000 ns 2688.530 ns 2290.27% 2190.27% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2137.750 ns 2113.560000 ns 98.87% -1.13% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 307.692 ns 287.722000 ns 93.51% -6.49% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2685.060 ns 2464.050000 ns 91.77% -8.23% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3457.650 ns 3097.620000 ns 89.59% -10.41% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4937.160000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3506.160000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7): 143.876%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 104.384000 ns 705.635 ns 676.00% 576.00% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 701.882 ns 698.410000 ns 99.51% -0.49% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 278.621 ns 272.237000 ns 97.71% -2.29% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 213.676 ns 208.759000 ns 97.70% -2.30% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 199.261 ns 191.313000 ns 96.01% -3.99% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 491.052000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 120.083000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7): 148.526%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 133.241000 ns 1226.080 ns 920.20% 820.20% ++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3420.390 ns 3338.690000 ns 97.61% -2.39% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1315.010 ns 1274.570000 ns 96.92% -3.08% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 2199.040 ns 2038.360000 ns 92.69% -7.31% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 292.019 ns 261.553000 ns 89.57% -10.43% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4770.270000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3227.210000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7): 126.308%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 196.181000 ns 707.467 ns 360.62% 260.62% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 311.369 ns 310.903000 ns 99.85% -0.15% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 726.483 ns 706.907000 ns 97.31% -2.69% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 196.378 ns 189.545000 ns 96.52% -3.48% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 206.764 ns 196.551000 ns 95.06% -4.94% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 506.238000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 119.273000 ns -
Relative perf in group alloc/min (8): 81.100%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 625.028000 ns 832.725 ns 133.23% 33.23% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 170.467000 ns 174.753 ns 102.51% 2.51% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 958.816 ns 958.800000 ns 100.00% -0.00% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 810.623 ns 797.092000 ns 98.33% -1.67% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1019.430 ns 965.779000 ns 94.74% -5.26% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 792.064 ns 177.130000 ns 22.36% -77.64% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4415.900000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 346.002000 ns -
Relative perf in group multiple (22): 26.729%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15224.200000 ns 16418.600 ns 107.85% 7.85% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 145167.000000 ns 146423.000 ns 100.87% 0.87% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 161032.000000 ns 162279.000 ns 100.77% 0.77% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75164.700000 ns 75451.700 ns 100.38% 0.38% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 33117.300000 ns 33153.600 ns 100.11% 0.11% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4333.700 ns 4283.690000 ns 98.85% -1.15% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25961.700 ns 25525.500000 ns 98.32% -1.68% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31494.200 ns 30910.300000 ns 98.15% -1.85% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1186360.000 ns 1162100.000000 ns 97.96% -2.04% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 142303.000 ns 138360.000000 ns 97.23% -2.77% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1209150.000 ns 1174970.000000 ns 97.17% -2.83% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 47311.300 ns 41438.000000 ns 87.59% -12.41% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10725700.000 ns 140162.000000 ns 1.31% -98.69% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2669490.000 ns 30121.800000 ns 1.13% -98.87% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 8126240.000 ns 27477.700000 ns 0.34% -99.66% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2632540.000 ns 4208.520000 ns 0.16% -99.84% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1749830.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 209833.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 496101.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24181.900000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 621195.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 60798.000000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.369000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.395000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 23.506000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.407000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.149000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.673000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.866000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.785000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 123166.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.495000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 252.914000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 219.832000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.900000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.070000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 858.023000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6896.127000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17165.065000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46811.855000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2047.766000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7766.797000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8883.578000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27030.035000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1199.669000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42602.254000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112408.658000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71746.038000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72583.103000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353349.563000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353086.695000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 55.253000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.493000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 677.203000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5621.320000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5631.730000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 56454.921000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 363.339623 M keys/sec
Velocity-Bench Bitcracker - 38.359400 s
Velocity-Bench CudaSift - 203.947000 ms
Velocity-Bench Easywave - 227.000000 ms
Velocity-Bench QuickSilver - 116.460000 MMS/CTT
Velocity-Bench Sobel Filter - 603.076000 ms
Velocity-Bench dl-cifar - 23.630300 s
Velocity-Bench dl-mnist - 2.710000 s
Velocity-Bench svm - 0.135900 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 259.444000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 274.274000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 275.173000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 276.461000 ms
Runtime_DAGTaskThroughput_SingleTask - 1648.643000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1704.436000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1710.439000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1673.462000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 4.526000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.585000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.376000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.456000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 617.437000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 617.442000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.276000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 4.940000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 4.909000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 4.716000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 616.834000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 616.784000 ms
MicroBench_LocalMem_int32_4096 - 29.862000 ms
MicroBench_LocalMem_fp32_4096 - 29.902000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.339000 ms
Pattern_Reduction_Hierarchical_int32 - 16.339000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.265000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.165000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.168000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.796000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.588000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.782000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.587000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.765000 ms
ScalarProduct_NDRange_int64 - 5.425000 ms
ScalarProduct_NDRange_fp32 - 3.749000 ms
ScalarProduct_Hierarchical_int32 - 10.541000 ms
ScalarProduct_Hierarchical_int64 - 11.490000 ms
ScalarProduct_Hierarchical_fp32 - 10.167000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.065000 ms
USM_Allocation_latency_fp32_host - 37.623000 ms
USM_Allocation_latency_fp32_shared - 0.062000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.737000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.087000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.893000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.258000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.477000 ms
VectorAddition_int64 - 3.088000 ms
VectorAddition_fp32 - 1.480000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.477000 ms
Polybench_Atax - 6.402000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.111000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 881.915000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 830.097430 token/s
llama.cpp Text Generation Batched 128 - 62.790938 token/s
llama.cpp Prompt Processing Batched 256 - 878.291089 token/s
llama.cpp Text Generation Batched 256 - 62.777001 token/s
llama.cpp Prompt Processing Batched 512 - 435.723514 token/s
llama.cpp Text Generation Batched 512 - 62.788791 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057580018

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057580018
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 97.691%.
Improved 5 Regressed 22 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (6): 94.616%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 2648.410000 ns 2688.530 ns 101.51% 1.51% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3064.420000 ns 3097.620 ns 101.08% 1.08% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 305.545 ns 287.722000 ns 94.17% -5.83% -----
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2370.280 ns 2113.560000 ns 89.17% -10.83% ---------
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2799.920 ns 2464.050000 ns 88.00% -12.00% ----------
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc 117.189000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (6): 96.714%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 721.182 ns 705.635000 ns 97.84% -2.16% --
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 717.457 ns 698.410000 ns 97.35% -2.65% --
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 282.656 ns 272.237000 ns 96.31% -3.69% ---
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 216.960 ns 208.759000 ns 96.22% -3.78% ---
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 199.570 ns 191.313000 ns 95.86% -4.14% ---
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc 86.443300 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (6): 100.102%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1921.220000 ns 2038.360 ns 106.10% 6.10% +++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1231.370000 ns 1274.570 ns 103.51% 3.51% +++
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3290.330000 ns 3338.690 ns 101.47% 1.47% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 1238.010 ns 1226.080000 ns 99.04% -0.96% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 287.178 ns 261.553000 ns 91.08% -8.92% -------
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc 109.364000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (6): 97.198%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 194.876000 ns 196.551 ns 100.86% 0.86% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 313.688 ns 310.903000 ns 99.11% -0.89% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 736.269 ns 707.467000 ns 96.09% -3.91% ---
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 739.125 ns 706.907000 ns 95.64% -4.36% ----
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 200.720 ns 189.545000 ns 94.43% -5.57% -----
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc 84.055000 ns -
Relative perf in group alloc/min (8): 97.779%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 970.888 ns 965.779000 ns 99.47% -0.53% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 177.158 ns 174.753000 ns 98.64% -1.36% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 981.498 ns 958.800000 ns 97.69% -2.31% --
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 854.006 ns 832.725000 ns 97.51% -2.49% --
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 821.160 ns 797.092000 ns 97.07% -2.93% --
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 183.892 ns 177.130000 ns 96.32% -3.68% ---
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc 435.088000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc 280.303000 ns -
Relative perf in group multiple (20): 98.353%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15097.900000 ns 16418.600 ns 108.75% 8.75% +++++++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 141189.000000 ns 146423.000 ns 103.71% 3.71% +++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 158959.000000 ns 162279.000 ns 102.09% 2.09% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32597.200000 ns 33153.600 ns 101.71% 1.71% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4227.420000 ns 4283.690 ns 101.33% 1.33% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 4199.780000 ns 4208.520 ns 100.21% 0.21% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 139441.000 ns 138360.000000 ns 99.22% -0.78% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 142050.000 ns 140162.000000 ns 98.67% -1.33% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 76545.000 ns 75451.700000 ns 98.57% -1.43% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 42248.200 ns 41438.000000 ns 98.08% -1.92% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31636.100 ns 30910.300000 ns 97.71% -2.29% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 28264.900 ns 27477.700000 ns 97.21% -2.79% --
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 31783.800 ns 30121.800000 ns 94.77% -5.23% ----
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 27169.100 ns 25525.500000 ns 93.95% -6.05% -----
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1309000.000 ns 1174970.000000 ns 89.76% -10.24% ---------
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1295170.000 ns 1162100.000000 ns 89.73% -10.27% ---------
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc 30644.600000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc 24425.600000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc 47559.500000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc 26449.600000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.369000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.395000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 23.506000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.407000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.149000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.673000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.866000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.785000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 123166.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.495000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 252.914000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 219.832000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.900000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.070000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 858.023000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6896.127000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17165.065000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46811.855000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2047.766000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7766.797000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8883.578000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27030.035000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1199.669000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42602.254000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112408.658000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71746.038000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72583.103000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353349.563000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353086.695000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 55.253000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.493000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 677.203000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5621.320000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5631.730000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 56454.921000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 363.339623 M keys/sec
Velocity-Bench Bitcracker - 38.359400 s
Velocity-Bench CudaSift - 203.947000 ms
Velocity-Bench Easywave - 227.000000 ms
Velocity-Bench QuickSilver - 116.460000 MMS/CTT
Velocity-Bench Sobel Filter - 603.076000 ms
Velocity-Bench dl-cifar - 23.630300 s
Velocity-Bench dl-mnist - 2.710000 s
Velocity-Bench svm - 0.135900 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 259.444000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 274.274000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 275.173000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 276.461000 ms
Runtime_DAGTaskThroughput_SingleTask - 1648.643000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1704.436000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1710.439000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1673.462000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 4.526000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.585000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.376000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.456000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 617.437000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 617.442000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.276000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 4.940000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 4.909000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 4.716000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 616.834000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 616.784000 ms
MicroBench_LocalMem_int32_4096 - 29.862000 ms
MicroBench_LocalMem_fp32_4096 - 29.902000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.339000 ms
Pattern_Reduction_Hierarchical_int32 - 16.339000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.265000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.165000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.168000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.796000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.588000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.782000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.587000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.765000 ms
ScalarProduct_NDRange_int64 - 5.425000 ms
ScalarProduct_NDRange_fp32 - 3.749000 ms
ScalarProduct_Hierarchical_int32 - 10.541000 ms
ScalarProduct_Hierarchical_int64 - 11.490000 ms
ScalarProduct_Hierarchical_fp32 - 10.167000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.065000 ms
USM_Allocation_latency_fp32_host - 37.623000 ms
USM_Allocation_latency_fp32_shared - 0.062000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.737000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.087000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.893000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.258000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.477000 ms
VectorAddition_int64 - 3.088000 ms
VectorAddition_fp32 - 1.480000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.477000 ms
Polybench_Atax - 6.402000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.111000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 881.915000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 830.097430 token/s
llama.cpp Text Generation Batched 128 - 62.790938 token/s
llama.cpp Prompt Processing Batched 256 - 878.291089 token/s
llama.cpp Text Generation Batched 256 - 62.777001 token/s
llama.cpp Prompt Processing Batched 512 - 435.723514 token/s
llama.cpp Text Generation Batched 512 - 62.788791 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057758246

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13057758246
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 70.702%.
Improved 7 Regressed 21 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (8): 175.093%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 125.078000 ns 2688.530 ns 2149.48% 2049.48% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 287.486000 ns 287.722 ns 100.08% 0.08% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3139.720 ns 3097.620000 ns 98.66% -1.34% .
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2341.950 ns 2113.560000 ns 90.25% -9.75% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2867.930 ns 2464.050000 ns 85.92% -14.08% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4886.400000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3675.650000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc 115.317000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (8): 144.153%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 104.878000 ns 705.635 ns 672.82% 572.82% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 273.410 ns 272.237000 ns 99.57% -0.43% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 709.090 ns 698.410000 ns 98.49% -1.51% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 195.642 ns 191.313000 ns 97.79% -2.21% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 216.395 ns 208.759000 ns 96.47% -3.53% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 504.066000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 119.657000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc 83.827600 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (8): 158.822%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 120.480000 ns 1226.080 ns 1017.66% 917.66% ++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1881.740000 ns 2038.360 ns 108.32% 8.32% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1276.630 ns 1274.570000 ns 99.84% -0.16% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3419.580 ns 3338.690000 ns 97.63% -2.37% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 278.118 ns 261.553000 ns 94.04% -5.96% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4683.850000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3476.140000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc 105.754000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (8): 126.359%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 202.191000 ns 707.467 ns 349.90% 249.90% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 308.898000 ns 310.903 ns 100.65% 0.65% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 720.780 ns 706.907000 ns 98.08% -1.92% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 203.250 ns 196.551000 ns 96.70% -3.30% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 196.533 ns 189.545000 ns 96.44% -3.56% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 497.786000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 124.563000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc 85.477400 ns -
Relative perf in group alloc/min (10): 80.930%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 558.978000 ns 832.725 ns 148.97% 48.97% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 177.448 ns 174.753000 ns 98.48% -1.52% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 986.078 ns 958.800000 ns 97.23% -2.77% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1029.760 ns 965.779000 ns 93.79% -6.21% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 874.017 ns 797.092000 ns 91.20% -8.80% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 769.181 ns 177.130000 ns 23.03% -76.97% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4336.230000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 356.478000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc 369.412000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc 259.044000 ns -
Relative perf in group multiple (26): 26.243%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15530.100000 ns 16418.600 ns 105.72% 5.72% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 145697.000000 ns 146423.000 ns 100.50% 0.50% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4283.890 ns 4283.690000 ns 100.00% -0.00% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75817.200 ns 75451.700000 ns 99.52% -0.48% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25743.700 ns 25525.500000 ns 99.15% -0.85% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 33503.600 ns 33153.600000 ns 98.96% -1.04% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 140397.000 ns 138360.000000 ns 98.55% -1.45% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 169540.000 ns 162279.000000 ns 95.72% -4.28% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 33766.800 ns 30910.300000 ns 91.54% -8.46% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1301690.000 ns 1162100.000000 ns 89.28% -10.72% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1345340.000 ns 1174970.000000 ns 87.34% -12.66% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 47572.100 ns 41438.000000 ns 87.11% -12.89% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10698100.000 ns 140162.000000 ns 1.31% -98.69% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2622980.000 ns 30121.800000 ns 1.15% -98.85% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 8020940.000 ns 27477.700000 ns 0.34% -99.66% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2607090.000 ns 4208.520000 ns 0.16% -99.84% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1780830.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 216329.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 517181.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24685.200000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 637670.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 60627.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc 30705.300000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc 24048.400000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc 48735.200000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc 26814.000000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.369000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.395000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 23.506000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.407000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.149000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.673000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.866000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.785000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 123166.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.495000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 252.914000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 219.832000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.900000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.070000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 858.023000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6896.127000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17165.065000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46811.855000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2047.766000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7766.797000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8883.578000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27030.035000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1199.669000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42602.254000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112408.658000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71746.038000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72583.103000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353349.563000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353086.695000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 55.253000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.493000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 677.203000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5621.320000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5631.730000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 56454.921000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 363.339623 M keys/sec
Velocity-Bench Bitcracker - 38.359400 s
Velocity-Bench CudaSift - 203.947000 ms
Velocity-Bench Easywave - 227.000000 ms
Velocity-Bench QuickSilver - 116.460000 MMS/CTT
Velocity-Bench Sobel Filter - 603.076000 ms
Velocity-Bench dl-cifar - 23.630300 s
Velocity-Bench dl-mnist - 2.710000 s
Velocity-Bench svm - 0.135900 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 259.444000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 274.274000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 275.173000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 276.461000 ms
Runtime_DAGTaskThroughput_SingleTask - 1648.643000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1704.436000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1710.439000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1673.462000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 4.526000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.585000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.376000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.456000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 617.437000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 617.442000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.276000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 4.940000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 4.909000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 4.716000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 616.834000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 616.784000 ms
MicroBench_LocalMem_int32_4096 - 29.862000 ms
MicroBench_LocalMem_fp32_4096 - 29.902000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.339000 ms
Pattern_Reduction_Hierarchical_int32 - 16.339000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.265000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.165000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.168000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.796000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.588000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.782000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.587000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.765000 ms
ScalarProduct_NDRange_int64 - 5.425000 ms
ScalarProduct_NDRange_fp32 - 3.749000 ms
ScalarProduct_Hierarchical_int32 - 10.541000 ms
ScalarProduct_Hierarchical_int64 - 11.490000 ms
ScalarProduct_Hierarchical_fp32 - 10.167000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.065000 ms
USM_Allocation_latency_fp32_host - 37.623000 ms
USM_Allocation_latency_fp32_shared - 0.062000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.737000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.087000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.893000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.258000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.477000 ms
VectorAddition_int64 - 3.088000 ms
VectorAddition_fp32 - 1.480000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.477000 ms
Polybench_Atax - 6.402000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.111000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 881.915000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 830.097430 token/s
llama.cpp Text Generation Batched 128 - 62.790938 token/s
llama.cpp Prompt Processing Batched 256 - 878.291089 token/s
llama.cpp Text Generation Batched 256 - 62.777001 token/s
llama.cpp Prompt Processing Batched 512 - 435.723514 token/s
llama.cpp Text Generation Batched 512 - 62.788791 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13072873192

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13072873192
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 70.284%.
Improved 9 Regressed 22 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (8): 175.757%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 132.930000 ns 2735.530 ns 2057.87% 1957.87% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 299.151000 ns 306.767 ns 102.55% 2.55% .
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2342.330 ns 2192.650000 ns 93.61% -6.39% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2834.960 ns 2620.060000 ns 92.42% -7.58% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3455.850 ns 3174.620000 ns 91.86% -8.14% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4683.370000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3318.150000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc 116.434000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (8): 146.171%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 102.201000 ns 711.693 ns 696.37% 596.37% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 190.621000 ns 195.988 ns 102.82% 2.82% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 709.139000 ns 710.790 ns 100.23% 0.23% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 216.513 ns 213.992000 ns 98.84% -1.16% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 288.393 ns 271.315000 ns 94.08% -5.92% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 494.155000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 120.109000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc 83.974500 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (8): 153.395%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 127.946000 ns 1230.060 ns 961.39% 861.39% ++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3392.580 ns 3386.980000 ns 99.83% -0.17% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1290.880 ns 1267.280000 ns 98.17% -1.83% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 266.420 ns 253.226000 ns 95.05% -4.95% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 2042.030 ns 1936.480000 ns 94.83% -5.17% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4713.150000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3310.840000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc 107.088000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (8): 111.552%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 203.756000 ns 730.895 ns 358.71% 258.71% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 192.770000 ns 192.935 ns 100.09% 0.09% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 322.983 ns 299.838000 ns 92.83% -7.17% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 269.559 ns 206.336000 ns 76.55% -23.45% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 1075.200 ns 727.999000 ns 67.71% -32.29% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 512.818000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 119.699000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc 83.848400 ns -
Relative perf in group alloc/min (10): 84.067%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 584.576000 ns 834.560 ns 142.76% 42.76% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1051.670000 ns 1128.250 ns 107.28% 7.28% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 963.425000 ns 968.189 ns 100.49% 0.49% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 177.373 ns 177.227000 ns 99.92% -0.08% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 882.652 ns 809.442000 ns 91.71% -8.29% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 728.316 ns 182.287000 ns 25.03% -74.97% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4316.560000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 408.532000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc 454.859000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc 260.657000 ns -
Relative perf in group multiple (26): 26.626%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 141791.000000 ns 144859.000 ns 102.16% 2.16% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15048.000000 ns 15279.900 ns 101.54% 1.54% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25436.400 ns 25041.800000 ns 98.45% -1.55% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1201200.000 ns 1181150.000000 ns 98.33% -1.67% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 163895.000 ns 160647.000000 ns 98.02% -1.98% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 77240.700 ns 75687.100000 ns 97.99% -2.01% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 30853.700 ns 30222.700000 ns 97.95% -2.05% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 142499.000 ns 139089.000000 ns 97.61% -2.39% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4309.030 ns 4200.920000 ns 97.49% -2.51% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1199190.000 ns 1162710.000000 ns 96.96% -3.04% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 33278.800 ns 31133.200000 ns 93.55% -6.45% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 47485.600 ns 41527.800000 ns 87.45% -12.55% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10815000.000 ns 138580.000000 ns 1.28% -98.72% -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2624880.000 ns 31018.400000 ns 1.18% -98.82% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 7634970.000 ns 27865.300000 ns 0.36% -99.64% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2619500.000 ns 4241.250000 ns 0.16% -99.84% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1757030.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 216294.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 522324.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24844.800000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 636413.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 59566.400000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc 29733.900000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc 25184.600000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc 49547.600000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc 26554.300000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.868000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.418000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 22.969000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.133000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.113000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.679000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.750000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.241000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 122876.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.005000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 251.872000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 132.472000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.573000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.158000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 860.664000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6939.950000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17154.077000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46935.372000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2093.086000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7472.404000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8689.121000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 25587.435000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1201.865000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 40846.653000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112790.682000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71747.470000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72642.878000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353339.946000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353502.721000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 54.566000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.367000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 674.284000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5721.966000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5688.177000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 57817.523000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 358.375158 M keys/sec
Velocity-Bench Bitcracker - 35.965200 s
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench QuickSilver - 117.580000 MMS/CTT
Velocity-Bench Sobel Filter - 611.944000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench dl-mnist - 2.720000 s
Velocity-Bench svm - 0.134300 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 824.202968 token/s
llama.cpp Text Generation Batched 128 - 62.990615 token/s
llama.cpp Prompt Processing Batched 256 - 870.375426 token/s
llama.cpp Text Generation Batched 256 - 62.990517 token/s
llama.cpp Prompt Processing Batched 512 - 429.991968 token/s
llama.cpp Text Generation Batched 512 - 62.959741 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

@@ -176,6 +176,16 @@ jobs:
-B${{github.workspace}}/umf_build
-DUMF_BUILD_BENCHMARKS=ON
-DUMF_TESTS_FAIL_ON_SKIP=ON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, you don't need UMF_TESTS_FAIL_ON_SKIP=ON if you disabled tests 😉

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074127788

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074127788
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 98.966%.
Improved 7 Regressed 13 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7): 102.411%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 1974.950000 ns 2192.650 ns 111.02% 11.02% ++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 2980.010000 ns 3174.620 ns 106.53% 6.53% ++++
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2612.180000 ns 2620.060 ns 100.30% 0.30% .
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 2801.300 ns 2735.530000 ns 97.65% -2.35% -
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 315.463 ns 306.767000 ns 97.24% -2.76% --
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc 116.435000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 tbbProxy 285.909000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7): 100.033%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 194.213000 ns 195.988 ns 100.91% 0.91% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 270.684000 ns 271.315 ns 100.23% 0.23% .
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 710.416000 ns 711.693 ns 100.18% 0.18% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 712.439 ns 710.790000 ns 99.77% -0.23% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 215.977 ns 213.992000 ns 99.08% -0.92% .
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc 83.198600 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 tbbProxy 200.493000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7): 101.053%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1879.980000 ns 1936.480 ns 103.01% 3.01% ++
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3296.140000 ns 3386.980 ns 102.76% 2.76% ++
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1253.630000 ns 1267.280 ns 101.09% 1.09% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 250.554000 ns 253.226 ns 101.07% 1.07% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 1262.270 ns 1230.060000 ns 97.45% -2.55% -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc 107.085000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 tbbProxy 283.559000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7): 96.234%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 189.414000 ns 192.935 ns 101.86% 1.86% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 728.650000 ns 730.895 ns 100.31% 0.31% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 730.075 ns 727.999000 ns 99.72% -0.28% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 305.863 ns 299.838000 ns 98.03% -1.97% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 249.688 ns 206.336000 ns 82.64% -17.36% ----------
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc 83.063600 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 tbbProxy 237.189000 ns -
Relative perf in group alloc/min (10): 97.801%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 175.385000 ns 182.287 ns 103.94% 3.94% ++
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 177.380 ns 177.227000 ns 99.91% -0.09% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 995.080 ns 968.189000 ns 97.30% -2.70% --
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1169.240 ns 1128.250000 ns 96.49% -3.51% --
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 874.349 ns 834.560000 ns 95.45% -4.55% ---
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 860.804 ns 809.442000 ns 94.03% -5.97% ---
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc 424.589000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc 269.304000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 tbbProxy 1002.640000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 tbbProxy 553.923000 ns -
Relative perf in group multiple (24): 98.237%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 27077.900000 ns 27865.300 ns 102.91% 2.91% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 14959.100000 ns 15279.900 ns 102.14% 2.14% +
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 30837.900000 ns 31018.400 ns 100.59% 0.59% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75342.600000 ns 75687.100 ns 100.46% 0.46% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 4224.170000 ns 4241.250 ns 100.40% 0.40% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 138852.000 ns 138580.000000 ns 99.80% -0.20% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1185160.000 ns 1181150.000000 ns 99.66% -0.34% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 146076.000 ns 144859.000000 ns 99.17% -0.83% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 42043.500 ns 41527.800000 ns 98.77% -1.23% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 30600.800 ns 30222.700000 ns 98.76% -1.24% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4285.380 ns 4200.920000 ns 98.03% -1.97% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 142705.000 ns 139089.000000 ns 97.47% -2.53% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1201590.000 ns 1162710.000000 ns 96.76% -3.24% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 167342.000 ns 160647.000000 ns 96.00% -4.00% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 33983.500 ns 31133.200000 ns 91.61% -8.39% -----
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 27771.100 ns 25041.800000 ns 90.17% -9.83% ------
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc 31597.600000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc 24584.800000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc 49996.400000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc 26547.700000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 tbbProxy 41531.500000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 tbbProxy 7802.280000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 tbbProxy 71042.700000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 tbbProxy 21488.000000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.868000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.418000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 22.969000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.133000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.113000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.679000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.750000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.241000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 122876.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.005000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 251.872000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 132.472000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.573000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.158000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 860.664000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6939.950000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17154.077000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46935.372000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2093.086000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7472.404000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8689.121000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 25587.435000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1201.865000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 40846.653000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112790.682000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71747.470000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72642.878000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353339.946000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353502.721000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 54.566000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.367000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 674.284000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5721.966000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5688.177000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 57817.523000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 358.375158 M keys/sec
Velocity-Bench Bitcracker - 35.965200 s
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench QuickSilver - 117.580000 MMS/CTT
Velocity-Bench Sobel Filter - 611.944000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench dl-mnist - 2.720000 s
Velocity-Bench svm - 0.134300 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 824.202968 token/s
llama.cpp Text Generation Batched 128 - 62.990615 token/s
llama.cpp Prompt Processing Batched 256 - 870.375426 token/s
llama.cpp Text Generation Batched 256 - 62.990517 token/s
llama.cpp Prompt Processing Batched 512 - 429.991968 token/s
llama.cpp Text Generation Batched 512 - 62.959741 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074686372

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074686372
Job status: cancelled. Test status: skipped.

Copy link
Contributor

Compute Benchmarks level_zero run (with params: --iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074730535

Copy link
Contributor

Compute Benchmarks level_zero run (--iterations-stddev 2 --iterations 2):
https://github.com/oneapi-src/unified-runtime/actions/runs/13074730535
Job status: success. Test status: success.

Summary

Total 42 benchmarks in mean.
Geomean 70.872%.
Improved 7 Regressed 22 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (9): 180.497%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 129.151000 ns 2735.530 ns 2118.09% 2018.09% ++++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3114.580000 ns 3174.620 ns 101.93% 1.93% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 302.353000 ns 306.767 ns 101.46% 1.46% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2770.360 ns 2620.060000 ns 94.57% -5.43% .
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2370.970 ns 2192.650000 ns 92.48% -7.52% .
alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4937.450000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3693.380000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc 119.817000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:4 tbbProxy 292.652000 ns -
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (9): 143.572%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 111.827000 ns 711.693 ns 636.42% 536.42% +++
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 703.831000 ns 710.790 ns 100.99% 0.99% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 215.418 ns 213.992000 ns 99.34% -0.66% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 274.851 ns 271.315000 ns 98.71% -1.29% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 202.487 ns 195.988000 ns 96.79% -3.21% .
alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 509.297000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 119.301000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc 85.880000 ns -
alloc/size:10000/0/4096/iterations:200000/threads:1 tbbProxy 193.541000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (9): 157.782%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 114.166000 ns 1230.060 ns 1077.43% 977.43% +++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1213.010000 ns 1267.280 ns 104.47% 4.47% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3347.390000 ns 3386.980 ns 101.18% 1.18% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 268.784 ns 253.226000 ns 94.21% -5.79% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 2124.880 ns 1936.480000 ns 91.13% -8.87% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool<os_provider> 4721.800000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool<os_provider> 3626.490000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc 107.922000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:4 tbbProxy 301.013000 ns -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (9): 125.679%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 203.178000 ns 730.895 ns 359.73% 259.73% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 724.707000 ns 727.999 ns 100.45% 0.45% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 207.595 ns 206.336000 ns 99.39% -0.61% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 195.211 ns 192.935000 ns 98.83% -1.17% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 339.452 ns 299.838000 ns 88.33% -11.67% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool<os_provider> 498.022000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool<os_provider> 119.925000 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc 85.368600 ns -
alloc/size:10000/100000/4096/iterations:200000/threads:1 tbbProxy 235.938000 ns -
Relative perf in group alloc/min (12): 80.852%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 673.706000 ns 834.560 ns 123.88% 23.88% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1071.980000 ns 1128.250 ns 105.25% 5.25% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 1000.410 ns 968.189000 ns 96.78% -3.22% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 186.226 ns 177.227000 ns 95.17% -4.83% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 870.143 ns 809.442000 ns 93.02% -6.98% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 728.926 ns 182.287000 ns 25.01% -74.99% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool<os_provider> 4429.480000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool<os_provider> 359.652000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc 436.643000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc 265.077000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 tbbProxy 855.811000 ns -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 tbbProxy 585.770000 ns -
Relative perf in group multiple (30): 26.301%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 142890.000000 ns 144859.000 ns 101.38% 1.38% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1183620.000 ns 1181150.000000 ns 99.79% -0.21% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1171440.000 ns 1162710.000000 ns 99.25% -0.75% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 141479.000 ns 139089.000000 ns 98.31% -1.69% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 164147.000 ns 160647.000000 ns 97.87% -2.13% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15632.200 ns 15279.900000 ns 97.75% -2.25% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31080.900 ns 30222.700000 ns 97.24% -2.76% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4351.410 ns 4200.920000 ns 96.54% -3.46% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 78417.600 ns 75687.100000 ns 96.52% -3.48% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32525.400 ns 31133.200000 ns 95.72% -4.28% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 27680.600 ns 25041.800000 ns 90.47% -9.53% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 46880.000 ns 41527.800000 ns 88.58% -11.42% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 10839200.000 ns 138580.000000 ns 1.28% -98.72% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 2738240.000 ns 31018.400000 ns 1.13% -98.87% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 8036080.000 ns 27865.300000 ns 0.35% -99.65% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 2667030.000 ns 4241.250000 ns 0.16% -99.84% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool<os_provider> 1738330.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool<os_provider> 214090.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool<os_provider> 493122.000000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool<os_provider> 24593.300000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool<os_provider> 618549.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool<os_provider> 61579.600000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc 30269.100000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc 24191.300000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc 52034.000000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc 26243.700000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 tbbProxy 42244.900000 ns -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 tbbProxy 7734.190000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 tbbProxy 71309.600000 ns -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 tbbProxy 21424.600000 ns -
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.868000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.418000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 22.969000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.133000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.113000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.679000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 104663.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.750000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110006.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.241000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 122876.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.005000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 251.872000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 132.472000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.573000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.158000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 860.664000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6939.950000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17154.077000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 46935.372000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2093.086000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7472.404000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8689.121000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 25587.435000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1201.865000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 40846.653000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 112790.682000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71747.470000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72642.878000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353339.946000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353502.721000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 54.566000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.367000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 674.284000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5721.966000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5688.177000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 57817.523000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 358.375158 M keys/sec
Velocity-Bench Bitcracker - 35.965200 s
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench QuickSilver - 117.580000 MMS/CTT
Velocity-Bench Sobel Filter - 611.944000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench dl-mnist - 2.720000 s
Velocity-Bench svm - 0.134300 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 824.202968 token/s
llama.cpp Text Generation Batched 128 - 62.990615 token/s
llama.cpp Prompt Processing Batched 256 - 870.375426 token/s
llama.cpp Text Generation Batched 256 - 62.990517 token/s
llama.cpp Prompt Processing Batched 512 - 429.991968 token/s
llama.cpp Text Generation Batched 512 - 62.959741 token/s

Details

Benchmark details - environment, command...
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 disjoint_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 jemalloc

Environment Variables:

LD_PRELOAD=libjemalloc.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 tbbProxy

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

@martygrant
Copy link
Contributor

Unified Runtime -> intel/llvm Repo Move Notice

Information

The source code of Unified Runtime has been moved to intel/llvm under the unified-runtime top-level directory,
all future development will now be carried out there. This was done in intel/llvm#17043.

The code will be mirrored to oneapi-src/unified-runtime and the specification will continue to be hosted at oneapi-src.github.io/unified-runtime.

The contribution guide has been updated with new instructions for contributing to Unified Runtime.

PR Migration

All open PRs including this one will be labelled auto-close and shall be automatically closed after 30 days.
To allow for some breathing space, this automation will not be enabled until next week (27/02/2025).

Should you wish to continue with your PR you will need to migrate it to intel/llvm.
We have provided a script to help automate this process.


This is an automated comment.

@martygrant
Copy link
Contributor

Unified Runtime -> intel/llvm Repo Move Notice

Following on from the previous notice, we have now enabled workflows to automatically label and close PRs because the Unified Runtime source code has moved to intel/llvm.

This PR has now been marked with the auto-close label and will be automatically closed after 30 days.

Please review the previous notice for more information, including assistance with migrating your PR to intel/llvm.

Should there be a reason for this PR to remain open, manually remove the auto-close label.


This is an automated comment.

Copy link
Contributor

Automatic PR Closure Notice

Information

This PR has been closed automatically. It was marked with the auto-close label 30 days ago as part of the Unified Runtime source code migration to the intel/llvm repository - intel/llvm#17043.

All Unified Runtime development should be done in intel/llvm, details can be found in the updated contribution guide.
This repository will continue to exist as a mirror and will host the specification documentation.

Next Steps

Should you wish to re-open this PR it must be moved to intel/llvm. We have provided a script to help automate this process, otherwise no actions are required.


This is an automated comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-close ci/cd Continuous integration/devliery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants