-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detailed consensus data correlated together. #5302
base: develop
Are you sure you want to change the base?
Conversation
4accd3d
to
cc9a392
Compare
Rebased against a more recent commit. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5302 +/- ##
=========================================
+ Coverage 77.9% 78.1% +0.2%
=========================================
Files 791 790 -1
Lines 68006 67910 -96
Branches 8346 8227 -119
=========================================
+ Hits 52967 53028 +61
+ Misses 15039 14882 -157
|
cc9a392
to
be1c95c
Compare
message. Allows quick correlation of events that previously were either not logged or, if logged, strewn across multiple lines, making correlation difficult. The Heartbeat Timer and consensus ledger accept processing each have this capability. Also guarantees that log entries will be written if the node is a validator, regardless of log severity level. Otherwise, the level of these messages is at INFO severity.
be1c95c
to
caedb46
Compare
@@ -34,38 +34,55 @@ shouldCloseLedger( | |||
std::chrono::milliseconds openTime, // Time waiting to close this ledger | |||
std::chrono::milliseconds idleInterval, | |||
ConsensusParms const& parms, | |||
beast::Journal j) | |||
beast::Journal j, | |||
std::unique_ptr<std::stringstream> const& clog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new parameter to pass clog, should we extend beast::Journal to support streamed logging? Otherwise, we have to pass as a parameter two loggers now. It is our version of beast and we are free to modify it. We might have to switch to shared_ptr since beast::Journal is passed by value, but otherwise, it should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single journal object won't work because the ConsensusLogger crosses through multiple objects and free functions, each with possibly distinct journal objects. This does not replace the existing journal objects, but adds additional logging. In some cases, existing logging could be subsumed by this, but in other cases, what's logged should stay being logged, such as warnings or errors.
{ | ||
std::string header_; | ||
beast::Journal j_; | ||
std::unique_ptr<std::stringstream> ss_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need std::unique_ptr
if the clog is passed by reference everywhere?
I don't see a need for transfer of ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unique pointer is mainly so the CLOG macro (in https://github.com/XRPLF/rippled/pull/5302/files#diff-e6faaf2ceab5905a3aeb3755b92c06f0ee967f612208b2b1f10b5dd963a574bf) can be used to short-circuit if there's no object. This avoids string processing of <<. This is the same idea as the JLOG macro (which is right above CLOG).
Some of the functions are called for reasons other than the heartbeat timer, which is another reason the unique pointer could be empty. In this case, it probably makes sense to log other code paths, not just the 2 I included (heartbeat timer and doAccept). There are other consensus activities that aren't included in this currently, but I think they're the most essential as a starting point. I want to be careful not to make this into some kind of "log everything" project, and stick instead with the most essential. I think if more things are identified as useful that they can be included, and remove the non-useful. But mainly it's a starting point for trouble-shooting, not necessarily a guarantee that it identifies all possible problems.
Combine multiple related debug log data points into a single
message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn across multiple lines, making correlation difficult. The Heartbeat Timer and consensus ledger accept processing each have this capability.
Also guarantees that log entries will be written if the node is a validator, regardless of log severity level. Otherwise, the level of these messages is at INFO severity.
High Level Overview of Change
Enable logging of multiple data points so that they are emitted to the debug log at the same time. Validators will have these messages always emitted, regardless
of other configuration of debug log severity level.
Context of Change
The context is to facilitate data collection and troubleshooting. Currently, this much
of the data in the PR is not being logged. And for that which is being logged, it's
written all over the debug log on multiple lines. This makes it difficult to correlate
consensus-related events.
And because this change emits the logs automatically for validators, no special configuration of severity level needs to be made.
Type of Change
.gitignore
, formatting, dropping support for older tooling)API Impact
libxrpl
change (any change that may affectlibxrpl
or dependents oflibxrpl
)New, verbose log entries are created. They can be grepped with the string "ConsensusLogger". They are emitted approximately every second, with an additional entry for each consensus "accept" phase. It increases logging requirements. Estimated 5MB/hr.
To see the log messages, grep for:
"ConsensusLogger: Heartbeat Timer" and "ConsensusLogger: onAccept"
If configured as a validator, all debug log severity levels will cause these messages to be emitted (though I'm not sure if kDisabled is over-ridden). To test, configure at warning, error, or fatal, and the log messages should still be emitted. They'll still say they're NFO (info) level, but they bypass the filter. If not configured as a validator, verify that the messages only appear for "info" level or more verbose.
Before committing to this, the community should be informed of the new feature since it will require more disk space, at least for validator operators or people with very terse logging configured. However, the disk requirements are not very large: roughly 5MB per hour. This is < 1GB/week, which is quite small for the benefit.