Skip to content

Commit 790c255

Browse files
committed
Split benchmarks articles into separate files
1 parent 9abad31 commit 790c255

File tree

5 files changed

+133
-131
lines changed

5 files changed

+133
-131
lines changed

developer/src/SUMMARY.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@
44

55
- [Architecture](architecture/README.md)
66
- [Performance](performance/README.md)
7-
- [Benchmarking](performance/benchmarking.md)
7+
- [Benchmarks](performance/benchmarks/README.md)
8+
- [Main Methods](performance/benchmarks/main_methods.md)
9+
- [Async Functions](performance/benchmarks/async_runtime.md)
810
- [Plugin System](plugin/README.md)
911
- [Life Sessions](plugin/life_sessions.md)
1012
- [Decisions Context](plugin/decisions/decisions-context.md)

developer/src/performance/benchmarking.md

Lines changed: 0 additions & 130 deletions
This file was deleted.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Benchmarks
2+
3+
This section contains various benchmarking findings for Chipmunk. Please refer to the child parts for detailed information and insights.
4+
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Benchmarking Asynchronous Functions
2+
3+
Asynchronous functions run within an async runtime, which can introduce overhead, especially when benchmarking small, simple functions (e.g., using mocks).
4+
5+
[Criterion](https://github.com/bheisler/criterion.rs) supports asynchronous benchmarking and allows users to configure the runtime. It also provides settings that can reduce the overhead introduced by the async runtime.
6+
7+
## Configuring Criterion with Async Runtimes
8+
9+
To minimize noise from the runtime, consider the following:
10+
11+
### Use a New Runtime Instance for Each Iteration
12+
13+
Asynchronous runtimes configure themselves during initialization based on system state, potentially leading to runtime overhead accumulating in one direction. Using a new runtime for each iteration helps distribute this overhead.
14+
15+
Example using Tokio:
16+
17+
```rust
18+
// Using one runtime for all iterations leads to consistent overhead.
19+
let runner = tokio::runtime::Runtime::new().unwrap();
20+
bencher
21+
.to_async(&runner)
22+
.iter(|bencher| ...);
23+
24+
// Using a new runtime per iteration distributes the overhead.
25+
bencher
26+
.to_async(tokio::runtime::Runtime::new().unwrap())
27+
.iter(|bencher| ...);
28+
29+
```
30+
31+
### Increase Warm-up Time
32+
33+
Allow the system to stabilize by increasing the warm-up time. This ensures that faulty runtime configurations are minimized when the system is cold.
34+
35+
### Additional Configurations
36+
37+
- **Increase sample size and measurement time**: Reduces noise and outliers.
38+
- **Lower significance level**: Helps with noisy benchmarks to reduce false positives changes.
39+
- **Raise noise threshold**: Reduces false positives in performance changes.
40+
41+
### Example of the Configuring:
42+
43+
```rust
44+
criterion_group! {
45+
...
46+
config = Criterion::default()
47+
// Warm-up time allows stable spawning of multiple async runtimes.
48+
.warm_up_time(Duration::from_secs(10))
49+
// Increased measurement time and sample size reduce noise.
50+
.measurement_time(Duration::from_secs(20))
51+
.sample_size(200)
52+
// Settings to reduce noise in the results.
53+
.significance_level(0.01)
54+
.noise_threshold(0.03);
55+
...
56+
}
57+
```
58+
59+
### Mocking Async Traits
60+
61+
When benchmarking generic functions with async traits, avoid calling `Future::poll()` on mocks to reduce async overhead. Achieve this by using a non-async inner function, marked as `#[inline(never)]`, within an always-inlined async function. This ensures the non-async function is used without inlining it to mimic the actual trait implementation.
62+
63+
Example:
64+
65+
```rust
66+
trait FooTrait {
67+
async fn bar(&self) -> usize;
68+
}
69+
70+
struct MockFoo;
71+
72+
impl FooTrait for MockFoo {
73+
#[inline(always)]
74+
async fn bar(&self) -> usize {
75+
#[inline(never)]
76+
fn inner_bar() -> usize {
77+
black_box(0)
78+
}
79+
80+
inner_bar()
81+
}
82+
}
83+
84+
```
85+
This avoids calling `Future::poll()` when invoking the async function:
86+
87+
```rust
88+
async some_fun(foo: FooTrait) {
89+
// The call can be made without awaiting.
90+
let val = foo.bar().await;
91+
92+
// `MockFoo::bar()` inlines, avoiding future polling, since `inner_bar()` isn't async.
93+
let val = inner_bar();
94+
}
95+
96+
```
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Benchmarking main methods
2+
3+
* In general we have two main methods to do benchmarks, the first measures the execution time and the second estimates CPU cycles.
4+
* Rust has official support for benchmarking that is used to bench the standard-library. However, this support is available currently on Rust nightly only.
5+
* Rust Eco-system has crates supporting the two methods, providing a similar interface to the `Bencher` from the standard-library which make it easier to write benchmarks for both of them together, or migrate from one to the other (Or to the `Bencher` once it has support on stable rust)
6+
7+
8+
## Wall Clock Time Method
9+
10+
* This method measures the elapsed time between starting and finishing a task using the system wall clock time.
11+
* It must run the tests many times to get more accurate results, and the results will vary between hosts' environment.
12+
* This method isn't suitable for CI by saving bench results of master branch and compare it with different branch, because the virtual machine state can change between each run. To get best accuracy, we need to run the benches on the master and on the branch in the same run to compare them in the environment.
13+
* The crate [criterion](https://github.com/bheisler/criterion.rs) provides a similar interface for `Bencher` from standard-library while it can run rust stable, providing more features like graphs and more control over the benchmarks themselves.
14+
* Benchmarks for specific parts of the app can be written with mocks if needed similar to the unit tests, providing a way to bench specific functions in isolated way.
15+
* This method doesn't demand and external support and can run cross-platform
16+
17+
18+
## CPU Cycles Method
19+
20+
* This method estimates and compares the CPU cycles using [Valgrind Callgrind](https://valgrind.org/docs/manual/cl-manual.html).
21+
* Each test will run once only and have results that are independent form the host state, which makes it reliable in CI pipelines
22+
* It provides high-precision measurements since it's counting the CPU cycles which make any difference noticeable with clear numbers.
23+
* Currently, the crate [Iai-Callgrind](https://github.com/iai-callgrind/iai-callgrind/tree/main?tab=readme-ov-file) has support for this kind of benchmarks in Rust.
24+
25+
### Cons:
26+
27+
* This method can run only where [Valgrind](https://valgrind.org/) runs (Currently Linux and MacOS)
28+
* CPU cycles don't match necessarily how much time the process took and it will still need timed benchmarks along side.
29+
* `Iai-Callgrind` doesn't have currently a lot of recognition and support from Rust community compared to `Criterion`, however they have great documentation and the library is active development.
30+

0 commit comments

Comments
 (0)