Skip to content

Commit c7a1236

Browse files
chore(dev/benchmarks): Reorganize benchmarks such that they can build/run against previous versions (#398)
I imagine there are a few ways to go about this, but I found moving the benchmarks to their own subdirectory and using `FetchContent` to build against various versions/source checkouts to be an intuitive way to do this. This also nicely separates benchmark-related CMake from non-benchmark related CMake and provides a nice way to benchmark locally against a few previous versions (via build presets). If we add more benchmarks in the future (or discover a flaw in an existing benchmark), it also provides a nice way to retrospectively run them against previous releases. I've added a more verbose description of the setup to the benchmarks README, but the general idea is: - Benchmarks are documented using Doxygen, which is really good at parsing documentation. Reading the XML is a bit of a pain but is better than undocumented or difficult-to-locate benchmarks and better than parsing source files yourself. - Configurations are CMake build presets, and CMake handles pulling a previous or local nanoarrow using `FetchContent`. This means that the only action needed on release to update the report is to add a configure preset. - The provided `benchmark-run-all.sh` effectively reuses build directories for minimal rebuilding during benchmark development. - The report is a [Quarto](https://quarto.org) document that renders to markdown. It is not the flashiest of reports but gets the job done. It could be replaced by something like [conbench](https://github.com/conbench/conbench) in the future. Example report in details below: <details> # Benchmark Report ## Configurations These benchmarks were run with the following configurations: | preset_name | preset_description | |:------------|:-------------------------------------------------| | local | Uses the nanoarrow C sources from this checkout. | | v0.4.0 | Uses the nanoarrow C sources the 0.4.0 release. | ## Summary A quick and dirty summary of benchmark results between this checkout and the last released version. | benchmark_label | v0.4.0 | local | change | pct_change | |:----------------------------------------------------------------------------|---------:|---------:|--------:|-----------:| | [ArrayViewGetIntUnsafeInt16](#arrayviewgetintunsafeint16) | 635.33µs | 631.47µs | 1ns | -0.6% | | [ArrayViewGetIntUnsafeInt32](#arrayviewgetintunsafeint32) | 635.96µs | 636.71µs | 753.7ns | 0.1% | | [ArrayViewGetIntUnsafeInt64](#arrayviewgetintunsafeint64) | 669.22µs | 680.5µs | 11.3µs | 1.7% | | [ArrayViewGetIntUnsafeInt64CheckNull](#arrayviewgetintunsafeint64checknull) | 1.03ms | 1.21ms | 178.7µs | 17.4% | | [ArrayViewGetIntUnsafeInt8](#arrayviewgetintunsafeint8) | 948.13µs | 946.34µs | 1ns | -0.2% | | [SchemaInitWideStruct](#schemainitwidestruct) | 1.04ms | 1.02ms | 1ns | -2.1% | | [SchemaViewInitWideStruct](#schemaviewinitwidestruct) | 106.08µs | 104.56µs | 1ns | -1.4% | ## ArrowArrayView-related benchmarks Benchmarks for consuming ArrowArrays using the ArrowArrayViewXXX() functions. ### ArrayViewGetIntUnsafeInt8 Use ArrowArrayViewGetIntUnsafe() to consume an int8 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/array_benchmark.cc#L108-L110) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 746 | 946µs | 945µs | 1,058,678,610 | | v0.4.0 | 745 | 948µs | 947µs | 1,056,345,018 | ### ArrayViewGetIntUnsafeInt16 Use ArrowArrayViewGetIntUnsafe() to consume an int16 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/array_benchmark.cc#L113-L115) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1115 | 631µs | 630µs | 1,586,161,276 | | v0.4.0 | 1110 | 635µs | 634µs | 1,576,482,853 | ### ArrayViewGetIntUnsafeInt32 Use ArrowArrayViewGetIntUnsafe() to consume an int32 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/array_benchmark.cc#L118-L120) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1106 | 637µs | 636µs | 1,572,865,930 | | v0.4.0 | 1116 | 636µs | 635µs | 1,574,396,587 | ### ArrayViewGetIntUnsafeInt64 Use ArrowArrayViewGetIntUnsafe() to consume an int64 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/array_benchmark.cc#L123-L125) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1036 | 680µs | 680µs | 1,471,241,907 | | v0.4.0 | 1039 | 669µs | 668µs | 1,496,471,266 | ### ArrayViewGetIntUnsafeInt64CheckNull Use ArrowArrayViewGetIntUnsafe() to consume an int64 array (checking for nulls) [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/array_benchmark.cc#L128-L130) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 581 | 1.21ms | 1.2ms | 830,641,968 | | v0.4.0 | 697 | 1.03ms | 1.02ms | 976,185,007 | ## Schema-related benchmarks Benchmarks for producing and consuming ArrowSchema. ### SchemaInitWideStruct Benchmark ArrowSchema creation for very wide tables. Simulates part of the process of creating a very wide table with a simple column type (integer). [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/schema_benchmark.cc#L45-L56) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 684 | 1.02ms | 1.02ms | 9,788,166 | | v0.4.0 | 686 | 1.04ms | 1.04ms | 9,606,888 | ### SchemaViewInitWideStruct Benchmark ArrowSchema parsing for very wide tables. Simulates part of the process of consuming a very wide table. Typically the ArrowSchemaViewInit() is done by ArrowArrayViewInit() but uses a similar pattern. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/c-more-benchmarks/dev/benchmarks/c/schema_benchmark.cc#L78-L91) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 6753 | 105µs | 104µs | 95,812,784 | | v0.4.0 | 6762 | 106µs | 106µs | 94,630,337 | </details> --------- Co-authored-by: Jacob Wujciak-Jens <[email protected]>
1 parent 9075dfa commit c7a1236

14 files changed

+926
-42
lines changed

.github/workflows/benchmarks.yaml

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,11 @@ on:
2525
branches:
2626
- main
2727
paths:
28-
- 'CMakeLists.txt'
2928
- '.github/workflows/benchmarks.yaml'
30-
- 'src/**/*_benchmark.cc'
29+
- 'dev/benchmarks/**'
30+
31+
permissions:
32+
contents: read
3133

3234
jobs:
3335
benchmarks:
@@ -36,17 +38,7 @@ jobs:
3638

3739
steps:
3840
- uses: actions/checkout@v4
39-
40-
- name: Build nanoarrow
41-
run: |
42-
mkdir build && cd build
43-
cmake .. -DNANOARROW_BUILD_BENCHMARKS=ON -DCMAKE_BUILD_TYPE=Release
44-
cmake --build .
45-
4641
- name: Run benchmarks
4742
run: |
48-
cd build
49-
for f in $(ls | grep -e "_benchmark"); do
50-
echo "::group::$(basename ${f})"
51-
./${f}
52-
done
43+
cd dev/benchmarks
44+
./benchmark-run-all.sh

CMakeLists.txt

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ cmake_minimum_required(VERSION 3.14)
2020

2121
if(NOT DEFINED CMAKE_C_STANDARD)
2222
set(CMAKE_C_STANDARD 99)
23+
set(CMAKE_C_STANDARD_REQUIRED ON)
2324
endif()
2425

2526
set(NANOARROW_VERSION "0.5.0-SNAPSHOT")
@@ -260,17 +261,5 @@ if(NANOARROW_BUILD_TESTS)
260261
endif()
261262

262263
if(NANOARROW_BUILD_BENCHMARKS)
263-
# benchmark requires at least C++11
264-
if(NOT DEFINED CMAKE_CXX_STANDARD)
265-
set(CMAKE_CXX_STANDARD 11)
266-
endif()
267-
268-
add_subdirectory("thirdparty/benchmark")
269-
270-
add_executable(schema_benchmark src/nanoarrow/schema_benchmark.cc)
271-
add_executable(array_benchmark src/nanoarrow/array_benchmark.cc)
272-
273-
target_link_libraries(schema_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
274-
target_link_libraries(array_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
275-
264+
add_subdirectory(dev/benchmarks)
276265
endif()

thirdparty/benchmark/CMakeLists.txt renamed to dev/benchmarks/.gitignore

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# "License"); you may not use this file except in compliance
77
# with the License. You may obtain a copy of the License at
88
#
9-
# http://www.apache.org/licenses/LICENSE-2.0
9+
# http://www.apache.org/licenses/LICENSE-2.0
1010
#
1111
# Unless required by applicable law or agreed to in writing,
1212
# software distributed under the License is distributed on an
@@ -15,13 +15,5 @@
1515
# specific language governing permissions and limitations
1616
# under the License.
1717

18-
include(FetchContent)
19-
20-
set(BENCHMARK_ENABLE_TESTING OFF)
21-
22-
fetchcontent_declare(benchmark
23-
URL https://github.com/google/benchmark/archive/refs/tags/v1.8.3.zip
24-
URL_HASH SHA256=abfc22e33e3594d0edf8eaddaf4d84a2ffc491ad74b6a7edc6e7a608f690e691
25-
)
26-
27-
fetchcontent_makeavailable(benchmark)
18+
.Rhistory
19+
benchmark-report.md

dev/benchmarks/CMakeLists.txt

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
19+
cmake_minimum_required(VERSION 3.14)
20+
include(FetchContent)
21+
22+
project(nanoarrow_benchmarks)
23+
24+
if(NOT DEFINED CMAKE_C_STANDARD)
25+
set(CMAKE_C_STANDARD 99)
26+
set(CMAKE_C_STANDARD_REQUIRED ON)
27+
endif()
28+
29+
if(NOT DEFINED CMAKE_CXX_STANDARD)
30+
set(CMAKE_CXX_STANDARD 11)
31+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
32+
endif()
33+
34+
set(NANOARROW_BENCHMARK_VERSION
35+
""
36+
CACHE STRING "nanoarrow version to benchmark")
37+
set(NANOARROW_BENCHMARK_SOURCE_DIR
38+
""
39+
CACHE STRING "path to a nanoarrow source checkout to benchmark" OFF)
40+
41+
# Avoids a warning about timestamps on downloaded files (prefer new policy
42+
# if available))
43+
if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.23")
44+
cmake_policy(SET CMP0135 NEW)
45+
endif()
46+
47+
# Use google/benchmark
48+
set(BENCHMARK_ENABLE_TESTING OFF)
49+
fetchcontent_declare(benchmark
50+
URL https://github.com/google/benchmark/archive/refs/tags/v1.8.3.zip
51+
URL_HASH SHA256=abfc22e33e3594d0edf8eaddaf4d84a2ffc491ad74b6a7edc6e7a608f690e691
52+
)
53+
fetchcontent_makeavailable(benchmark)
54+
55+
if(IS_DIRECTORY "${NANOARROW_BENCHMARK_SOURCE_URL}")
56+
fetchcontent_declare(nanoarrow SOURCE_DIR "${NANOARROW_BENCHMARK_SOURCE_URL}")
57+
fetchcontent_makeavailable(nanoarrow)
58+
elseif(NOT NANOARROW_BENCHMARK_SOURCE_URL STREQUAL "")
59+
fetchcontent_declare(nanoarrow URL "${NANOARROW_BENCHMARK_SOURCE_URL}")
60+
fetchcontent_makeavailable(nanoarrow)
61+
endif()
62+
63+
# Check that either the parent scope or this CMakeLists.txt defines a nanoarrow target
64+
if(NOT TARGET nanoarrow)
65+
message(FATAL_ERROR "nanoarrow target not found (missing -DNANOARROW_BENCHMARK_SOURCE_URL option?)"
66+
)
67+
endif()
68+
69+
# Add + link tests
70+
add_executable(schema_benchmark c/schema_benchmark.cc)
71+
add_executable(array_benchmark c/array_benchmark.cc)
72+
73+
target_link_libraries(schema_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
74+
target_link_libraries(array_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
75+
76+
# This lets all benchmarks run via ctest -VV when this is the top-level project
77+
include(CTest)
78+
enable_testing()
79+
add_test(NAME schema_benchmark COMMAND schema_benchmark
80+
--benchmark_out=schema_benchmark.json)
81+
add_test(NAME array_benchmark COMMAND array_benchmark
82+
--benchmark_out=array_benchmark.json)

dev/benchmarks/CMakePresets.json

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
{
2+
"version": 3,
3+
"cmakeMinimumRequired": {
4+
"major": 3,
5+
"minor": 21,
6+
"patch": 0
7+
},
8+
"configurePresets": [
9+
{
10+
"name": "base",
11+
"hidden": true,
12+
"cacheVariables": {
13+
"CMAKE_BUILD_TYPE": "Release",
14+
"CMAKE_EXPORT_COMPILE_COMMANDS": "ON"
15+
}
16+
},
17+
{
18+
"name": "local",
19+
"displayName": "local",
20+
"description": "Uses the nanoarrow C sources from this checkout.",
21+
"inherits": [
22+
"base"
23+
],
24+
"cacheVariables": {
25+
"NANOARROW_BENCHMARK_SOURCE_URL": "${sourceDir}/../.."
26+
}
27+
},
28+
{
29+
"name": "v0.4.0",
30+
"displayName": "v0.4.0",
31+
"description": "Uses the nanoarrow C sources the 0.4.0 release.",
32+
"inherits": [
33+
"base"
34+
],
35+
"cacheVariables": {
36+
"NANOARROW_BENCHMARK_SOURCE_URL": "https://github.com/apache/arrow-nanoarrow/archive/refs/tags/apache-arrow-nanoarrow-0.4.0.zip"
37+
}
38+
}
39+
]
40+
}

dev/benchmarks/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Benchmarking nanoarrow
21+
22+
This subdirectory contains benchmarks and tools to run them. This is currently
23+
only implemented for the C library but may expand to include the R and Python
24+
bindings. The structure is as follows:
25+
26+
- Benchmarks are documented inline using [Doxygen](https://www.doxygen.nl/).
27+
- Configurations are CMake build presets, and CMake handles pulling a previous
28+
or local nanoarrow using `FetchContent`. Benchmarks are run using `ctest`.
29+
- There is a bare-bones report written as a [Quarto](https://quarto.org)
30+
document that renders to markdown.
31+
32+
You can run benchmarks for a single configuration (e.g., `local`) with:
33+
34+
```shell
35+
mkdir build && cd build
36+
cmake .. --preset local
37+
cmake --build .
38+
ctest
39+
```
40+
41+
The provided `benchmark-run-all.sh` creates (or reuses, if they are already
42+
present) build directories in the form `build/<preset>` for each preset
43+
and runs `ctest`.
44+
45+
You can build a full report by running:
46+
47+
```shell
48+
./benchmark-run-all.sh
49+
cd apidoc && doxygen && cd ..
50+
quarto render benchmark-report.qmd
51+
```

dev/benchmarks/apidoc/.gitignore

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
xml

0 commit comments

Comments
 (0)