Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply PR#31: optimize retry pool #5113

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KirillLykov
Copy link

Problem

Transactions not included in the retry pool on full utilization

This PRs adds back #31

Summary of Changes

  • do not insert transactions with zero max_retries to the retry pool
  • remove transactions reached max_retries in the same iteration of the loop
  • dynamically select sleep time between iterations based on last_sent_time in TransactionInfo

@KirillLykov KirillLykov requested a review from fanatid March 2, 2025 19:28
@KirillLykov KirillLykov added the v2.2 Backport to v2.2 branch label Mar 3, 2025
Copy link

mergify bot commented Mar 3, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@@ -298,9 +308,17 @@ impl SendTransactionService {
{
// take a lock of retry_transactions and move the batch to the retry set.
let mut retry_transactions = retry_transactions.lock().unwrap();
let transactions_to_retry = transactions.len();
let mut transactions_to_retry: usize = 0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transactions_to_retry changed the semantics: now it is number of transactions that haven't reached retry limit.

@@ -369,6 +388,17 @@ impl SendTransactionService {
stats,
);
stats_report.report();

// to send transactions as soon as possible we adjust retry interval
retry_interval_ms = retry_interval_ms_default
Copy link
Author

@KirillLykov KirillLykov Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means just retry_interval_ms_default - (ms since last send) to prevent sleeping longer than retry_interval_ms_default (because between the moment when we sent and current moment some time passed).

@KirillLykov KirillLykov marked this pull request as ready for review March 3, 2025 18:50
@KirillLykov KirillLykov requested a review from steveluscher March 3, 2025 18:50
Copy link

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, @fanatid! I understand this PR, but really wish it was 3 PRs that each did one of the things in the description. Don't bother splitting it up now, but just allow me to lodge a complaint that it would have been easier to review as three tiny changes.


Unrelated to this PR: there are things I don't understand about this subsystem, that maybe the reader can help me work through.

  • I don't understand why we send transactions in batches. The actual sender (eg. the QUIC sender) sends them in a loop, one at a time, which doesn't sound like a batch to me. Why not just send each transaction immediately upon receiving it or observing its retry interval elapse?
  • Why do we call send_transactions_in_batch in two places? I feel like this code could be made much simpler if we had a single queue, whose entries were roughly of the shape (send_deadline, transaction_info), and a queue processor.
    • Retrying a transaction would involve decrementing its remaining retries and throwing it on the end of the queue with a new send_deadline.
    • Expiring a transaction would involve doing nothing. Just don't send it or re-add it to the queue.
    • The queue processor could sleep until the next send deadline, then consume all entries of the queue whose send deadline has been exceeded.

Comment on lines +100 to +108
fn get_max_retries(
&self,
default_max_retries: Option<usize>,
service_max_retries: usize,
) -> Option<usize> {
self.max_retries
.or(default_max_retries)
.map(|max_retries| max_retries.min(service_max_retries))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually that bullish on extracting this logic because I think it makes things harder to read, but if we're going to then we should catch all of the other places this happens in this file (eg. L463-466).

I think you could delete a ton of code if, instead of tracking max_retries and retries you changed TransactionInfo to track retries_remaining.

@steveluscher
Copy link

Before landing this, can someone flesh out the PR description a bit? I'm not sure that future readers will know what ‘transactions not included in the retry pool on full utilization’ means (I don't).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.2 Backport to v2.2 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants