-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance threads stacks collection and sampling frequency on Linux #191
base: master
Are you sure you want to change the base?
Enhance threads stacks collection and sampling frequency on Linux #191
Conversation
Signed-off-by: Francesco Vigliaturo <[email protected]>
The thread signalling each tids was calling sleep after each tid signal. This was an unintended bug/mistake. The correct behaviour is to send all the signals to the different tids as soon as possible and then sleep.
…to avoid race conditions
For the normal situation (e.g. 99 frequency with a fast enough CPU), this patch doesn't have many differences with the original one. Then I tried this PR with a simpler situation (the prime_number example), and a higher frequency (999). It becomes worse: What I have donehttps://github.com/tikv/pprof-rs/blob/master/src/profiler.rs#L334-L336 There is already a sample time in the signal handler. I append this sample time into the
Then, handle the
ResultFor the master pprof-rs, this graph is "smooth" everywhere: For the pprof-rs with this PR, this line "jump" at some places: (zoomed) I didn't analyze it with LSM or some other statistics methods, as the difference can be found simply through eyes. OthersI have other thoughts about this idea. As you have described, the main benefit brought by the real-time signal is that no signal is lost. Does it mean some stacktrace is duplicated (because the signal handler runs continuously, they'll all get the same stack trace)? Duplication and missing samples are both errors, and I don't think duplication is better 🤔. After all, this PR is really a good attempt on the interaction between real-time semantics and the profiling. I'll also study more and do some experiments with https://github.com/thoughtspot/threadstacks to see where is actually the problem. Thanks 🍻 . |
Hi @YangKeao , Many thanks for taking a look at this and for benchmarking this. When you say for that the same frequency you didn't see much better results and for higher frequency it actually got worse, do you mean in terms of sample collected across time and threads or are you referring at the plot shown above that plots exclusively the timestamp at which samples were collected?
I'm not sure that using
In principle it shouldn't, since we fire the signal at the specific I'm available if you want to discuss it further. Thanks again for the valuable feedback! |
@YangKeao can I ask a couple more questions to contextualize the tests/experiments?
Thanks |
This PR tries to address inconsistencies regarding the collection of samples across time and threads (somehow related to #177).
Currently to collect a sample (stack) we rely on the
SIGPROF
signal, which is a standard signal. This has the following two limitations:SIGPROF
signal is directed to the process and will then be handled by one of the thread, hence we'll only collect the stack for the thread that in a given moment is "handling" the signal.The changes in this PR try to address both of the above mentioned limitations by making use of
real-time
signals andper-thread
delivery.Real-Time Signal
An important distinction here is that real-time signals are queued. If multiple instances of a realtime signal are sent to a process, then the signal is delivered multiple times (hence we're not going to waste a potential "sampling round").
per-thread delivery
Instead of having a timer that will generate a
SIGPROF
signal that will be handled by a signal, we propose having an external thread which, with a given frequency, collects thetid
(kernel thread id) of the running threads and send the real-time signal to each one of them.