-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply PR#31: optimize retry pool #5113
base: master
Are you sure you want to change the base?
Conversation
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
@@ -298,9 +308,17 @@ impl SendTransactionService { | |||
{ | |||
// take a lock of retry_transactions and move the batch to the retry set. | |||
let mut retry_transactions = retry_transactions.lock().unwrap(); | |||
let transactions_to_retry = transactions.len(); | |||
let mut transactions_to_retry: usize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transactions_to_retry
changed the semantics: now it is number of transactions that haven't reached retry limit.
@@ -369,6 +388,17 @@ impl SendTransactionService { | |||
stats, | |||
); | |||
stats_report.report(); | |||
|
|||
// to send transactions as soon as possible we adjust retry interval | |||
retry_interval_ms = retry_interval_ms_default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means just retry_interval_ms_default - (ms since last send)
to prevent sleeping longer than retry_interval_ms_default
(because between the moment when we sent and current moment some time passed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix, @fanatid! I understand this PR, but really wish it was 3 PRs that each did one of the things in the description. Don't bother splitting it up now, but just allow me to lodge a complaint that it would have been easier to review as three tiny changes.
Unrelated to this PR: there are things I don't understand about this subsystem, that maybe the reader can help me work through.
- I don't understand why we send transactions in batches. The actual sender (eg. the QUIC sender) sends them in a loop, one at a time, which doesn't sound like a batch to me. Why not just send each transaction immediately upon receiving it or observing its retry interval elapse?
- Why do we call
send_transactions_in_batch
in two places? I feel like this code could be made much simpler if we had a single queue, whose entries were roughly of the shape(send_deadline, transaction_info)
, and a queue processor.- Retrying a transaction would involve decrementing its remaining retries and throwing it on the end of the queue with a new
send_deadline
. - Expiring a transaction would involve doing nothing. Just don't send it or re-add it to the queue.
- The queue processor could sleep until the next send deadline, then consume all entries of the queue whose send deadline has been exceeded.
- Retrying a transaction would involve decrementing its remaining retries and throwing it on the end of the queue with a new
fn get_max_retries( | ||
&self, | ||
default_max_retries: Option<usize>, | ||
service_max_retries: usize, | ||
) -> Option<usize> { | ||
self.max_retries | ||
.or(default_max_retries) | ||
.map(|max_retries| max_retries.min(service_max_retries)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually that bullish on extracting this logic because I think it makes things harder to read, but if we're going to then we should catch all of the other places this happens in this file (eg. L463-466).
I think you could delete a ton of code if, instead of tracking max_retries
and retries
you changed TransactionInfo
to track retries_remaining
.
Before landing this, can someone flesh out the PR description a bit? I'm not sure that future readers will know what ‘transactions not included in the retry pool on full utilization’ means (I don't). |
Problem
Transactions not included in the retry pool on full utilization
This PRs adds back #31
Summary of Changes
TransactionInfo