-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Motivation
A frequent request for scala/scala PRs (particularly collections changes) is that the changes be benchmarked; however, many obstacles exist for contributors running benchmarks on their personal computers, to the extent that many or perhaps most results would generously be classified as "questionable".
Background
The following are some common causes of performance/timing variance, and whether a particular type of machine avoids it.
| Laptop | Overclocked Desktop | Normally-clocked Desktop | |
|---|---|---|---|
| Boost clock speed change | ❌1 | ❌ | ✔ |
| Thermal throttling | ❌ | ❌ | ❓2 |
| Background tasks | ❌ | ❌ | ❌ |
1 While theoretically possible to turn off overclocking/boost-clocking on a laptop, the CPU may clock down due to even brief changes in battery/power state as well.
2 A normally-clocked desktop with good ventilation and cooling shouldn't thermally throttle, but neither of those is a guarantee in a person's home (sometimes cats sit on computers, for example).
The only type of machine that avoids any of these issues is a normally-clocked desktop, and not everyone has one of those (many of us only have laptops).
Additionally, all personal computers suffer from the problem that there are almost certainly background tasks (if not foreground tasks) running on them at all times. Benchmarks can take a long time to run, and even if someone can manage to not use their computer for an hour or two while benchmarks run, they probably don't want to have to close their web browser, 3+ chat applications (that are all electron, so basically also web browsers), and half a dozen other running programs and services. If they can't spare potentially multiple hours of their computer being tied up, it's even worse, with foreground tasks taking arbitrary and inconsistent CPU time.
Ideal Setup
To have benchmarking be reliable, it should be done on a dedicated machine running nothing else, and where cron/scheduled jobs are never running while a benchmark is running.
How do we reliably benchmark library changes?