-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document and improve (a lot) lintcheck --perf #14194
Conversation
Introducing a new chapter to the book, known as "Benchmarking Clippy". It explains the benchmarking capabilities of lintcheck --perf and gives a concrete example on how benchmark and compare a PR with master
- Now lintcheck perf deletes target directory after benchmarking, benchmarking with a cache isn't very useful or telling of any precise outcome. - Support for benchmarking several times without having to do a cargo clean. Now we can benchmark a PR and master (or a single change in the same commit) without having to move the perf.data files into an external directory. - Compress perf.data to allow for allowing multiple stacks and occupy much less space
Maybe we should integrate this into the CI as well. If this is costly (I haven't had a chance to try it yet), we may even be able to watch the comments and react to certain commands, as we would do with a bot. |
r? @Alexendoo I won't get to review this any time soon. And I think, Alex is the better reviewer for this anyway. |
|
||
The first `perf.data` will not have any numbers appended, but any subsequent | ||
benchmark will be written to `perf.data.number` with a number growing for 0. | ||
All benchmarks are compressed so that you can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the end of this sentence is missing! :)
cmd.args(&[ | ||
"record", | ||
"-e", | ||
"instructions", // Only count instructions | ||
"-g", // Enable call-graph, useful for flamegraphs and produces richer reports | ||
"--quiet", // Do not tamper with lintcheck's normal output | ||
"--compression-level=22", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not slow down the capture? zstd 22 is pretty slow
lintcheck/src/main.rs
Outdated
cmd.args(&[ | ||
"record", | ||
"-e", | ||
"instructions", // Only count instructions | ||
"-g", // Enable call-graph, useful for flamegraphs and produces richer reports | ||
"--quiet", // Do not tamper with lintcheck's normal output | ||
"--compression-level=22", | ||
"--freq=97", // Slow down program to capture all events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reduces the sampling frequency considerably meaning fewer captured events
There is no way to capture all the events with perf record
as it is a sampling profiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm I was under the impression that perf would throttle down the program to match the frequency, like it's said in the man page.
-F, --freq=
Profile at this frequency. Use max to use the currently maximum
allowed frequency, i.e. the value in the
kernel.perf_event_max_sample_rate sysctl. Will throttle down to the
currently maximum allowed frequency. See --strict-freq.
--strict-freq
Fail if the specified frequency can’t be used.
I'm not sure what's the solution, 4000 (the default frequency) loses a lot of the data in my machine, but I'm sure that other machines can handle it fine. (And the same would happen with any other frequency)
Maybe adding a --perf --frequency=<integer>
would be the best approach, or an environment variable? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that the sampling frequency will be clamped to the value of sysctl kernel.perf_event_max_sample_rate
, or be an error if --strict-freq
is passed
If an reproducible count is desired you would need to use something like cachegrind instead of a sampling profiler
I've changed the frequency to 3000, as its something that my PC can handle and I think I will be one of the people that most use does of |
For a follow up, what works well with perf is going to vary by device so it's probably worth having them passed in rather lintcheck having its own defaults |
In #14116 we added a benchmarking option for Lintcheck, this commit adds a new chapter to the book AND improves that option into a more usable state.
It's recommended to review one commit at a time.
Document how to benchmark with lintcheck --perf
Several improvements on lintcheck perf (desc.)
Now lintcheck perf deletes target directory after benchmarking,
benchmarking with a cache isn't very useful or telling of any
precise outcome.
Support for benchmarking several times without having to do
a cargo clean.
Compress perf.data
changelog: none