-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed checking if append_entries_request
batches are already present in follower log
#25018
base: dev
Are you sure you want to change the base?
Fixed checking if append_entries_request
batches are already present in follower log
#25018
Conversation
d7a60fa
to
153a9c2
Compare
|
||
reply.last_dirty_log_index = adjusted_prev_log_index; | ||
// limit the last flushed offset as the adjusted_prev_log_index | ||
// may have not yet been flushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean _flushed_offset
may store an offset that has not been flushed? In this case, could you add a comment to _flushed_offset
to explain what it actually denotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the adjusted_prev_log_index
may be greater than _flushed_offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but why can't we reply with _flushed_offset
if it is larger than adjusted_prev_log_index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not want leader to see flushed offset which is larger then last log offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is reply.last_flushed_log_index
the latest flushed log index that matches with the leader's log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, not really. This check is to hold an invariant of flushed_offset <= log_end_offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is log_end_offset
here? Is it a field of any structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reply.last_flushed_log_index = std::min(adjusted_prev_log_index, _flushed_offset);
does this run a risk of last_flushed_log_index
moving backwards, as seen by the leader? (which I think will have implications on leader commit index computation?
Perhaps we should just set
reply.last_flushed_log_index = _flushed_offset;
and let the _flushed_offset computation be monotonic
CI test resultstest results on build#61563
test results on build#61682
|
src/v/raft/tests/raft_fixture.h
Outdated
private: | ||
model::node_id _id; | ||
model::revision_id _revision; | ||
prefix_logger _logger; | ||
ss::sstring _base_directory; | ||
config::mock_property<size_t> _max_inflight_requests{16}; | ||
config::mock_property<size_t> _max_queued_bytes{1_MiB}; | ||
config::mock_property<size_t> _default_recovery_read_size{32_KiB}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we change it for existing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason, i will make sure it is the same as before
Why is that? Anything wrong with the macro? AFAIK it's meant to work with both gtest and boost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo one question, took me a bit to digest the change, had to dig up Alexey's change that added these checks. Would be nice to get a blessing from @ztlpn too.
|
||
reply.last_dirty_log_index = adjusted_prev_log_index; | ||
// limit the last flushed offset as the adjusted_prev_log_index | ||
// may have not yet been flushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reply.last_flushed_log_index = std::min(adjusted_prev_log_index, _flushed_offset);
does this run a risk of last_flushed_log_index
moving backwards, as seen by the leader? (which I think will have implications on leader commit index computation?
Perhaps we should just set
reply.last_flushed_log_index = _flushed_offset;
and let the _flushed_offset computation be monotonic
When follower receives an append entries request that `prev_log_index` is smaller than its own `prev_log_index` it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss. The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the `prev_log_index` is not initialized being negative. This case was not correctly handled by the logic calculating the next offset when checking matching batches. That lead to a situation in which a range of batches starting with 0 was never matching. Fixed the issue by correctly adjusting the `prev_log_index` if it is uninitialized. Signed-off-by: Michał Maślanka <[email protected]>
When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct `last_dirty_log_index`. This way a leader instead of moving the follower `next_offset` backward can start recovery process and deliver batches which the follower is missing. Signed-off-by: Michał Maślanka <[email protected]>
e67b4df
to
12b3024
Compare
Signed-off-by: Michał Maślanka <[email protected]>
Assertion triggered in function body are not propagated to the test itself. Change the method to throw an exception in case of timeout instead of using an assertion. Signed-off-by: Michał Maślanka <[email protected]>
Signed-off-by: Michał Maślanka <[email protected]>
The reply interceptor allows test creator to modify or drop the reply that is about to be processed by the RPC requester. This allow tests to take more control over the Raft protocol behavior and test some rare edge cases which might be hard to trigger otherwise. Signed-off-by: Michał Maślanka <[email protected]>
12b3024
to
5c4c17b
Compare
Signed-off-by: Michał Maślanka <[email protected]>
5c4c17b
to
ed45488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions as I'm not sure I understand the test.
std::ranges::copy( | ||
_nodes | std::views::keys | ||
| std::views::filter( | ||
[leader_id](model::node_id id) { return id != leader_id; }), | ||
std::back_inserter(followers)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use copy_if
?
/** | ||
* Recover communication and wait for the intercept to trigger | ||
*/ | ||
new_leader_node.reset_dispatch_handlers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will enable the new leader to send vote requests to the old leader. I guess it won't anyway, as it has been elected already. Do we need this?
@@ -395,7 +395,8 @@ class raft_fixture | |||
chunked_vector<model::record_batch> make_batches( | |||
size_t batch_count, | |||
size_t batch_record_count, | |||
size_t record_payload_size) { | |||
size_t record_payload_size, | |||
model::term_id term = model::term_id(0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you decide to keep it just in case it is needed in future?
* Recover communication and wait for the intercept to trigger | ||
*/ | ||
new_leader_node.reset_dispatch_handlers(); | ||
co_await reply_intercepted.wait([&] { return intercept_count > 5; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't produce anything after the second election. What are the 20+ messages that are replicated from the new leader to the old one?
Background
When follower receives an append entries request that
prev_log_index
is smaller than its ownprev_log_index
it validates if the batches from the request matches (by checking a batch offset and corresponding term) its own log. It that is the case the batches are skipped to prevent truncation of valid batches and avoid the data loss.Negative
append_entries_request::prev_log_index
The validation of already matching batches was broken if they happened to be at the beginning of the log. In this case the
prev_log_index
is not initialised. This case was not correctly handled by the logic calculating the next offset when checking matching batches.Replying with success when all request batches match
When follower receives an append entries request with the vector of records that are all present in its own log and their offsets and terms match it should reply with success and correct
last_dirty_log_index
.This way a leader instead of moving the follower
next_offset
backward can start recovery process and deliver batches which the follower is missing.Backports Required
Release Notes
Bug Fixes