Skip to content

Commit

Permalink
fix: try using a longer randomization period for retrying messages (#…
Browse files Browse the repository at this point in the history
…5416)

### Description

We have so many messages with similar retry counts that now even by
spreading them over 1h we still starve new messages. This PR increases
the random period to 6h, hoping to more evenly distribute retries of old
messages.

Some data to back this, you can see that after prepare failures start
spiking there's also a diff in the prep queue, indicating starvation
![Screenshot_2025-02-10_at_12 03
53](https://github.com/user-attachments/assets/b12d5502-391d-4985-a2ea-47f92c771cc8)
![Screenshot_2025-02-10_at_12 03
44](https://github.com/user-attachments/assets/f7193adf-01df-453d-aeba-a5002c283662)
![Screenshot_2025-02-10_at_12 03
18](https://github.com/user-attachments/assets/5bdf27cb-51ea-439a-ab98-4f310214b7ca)
  • Loading branch information
daniel-savu authored Feb 10, 2025
1 parent c17c18a commit 23e1f94
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions rust/main/agents/relayer/src/msg/pending_message.rs
Original file line number Diff line number Diff line change
Expand Up @@ -657,10 +657,10 @@ impl PendingMessage {
let hour: u64 = 60 * 60;
// To be extra safe, `max` to make sure it's at least 1 hour.
let target = hour.max((num_retries - 47) as u64 * hour);
// Schedule it at some random point in the next hour to
// Schedule it at some random point in the next 6 hours to
// avoid scheduling messages with the same # of retries
// at the exact same time.
target + (rand::random::<u64>() % hour)
// at the exact same time and starve new messages.
target + (rand::random::<u64>() % (6 * hour))
}
}))
}
Expand Down

0 comments on commit 23e1f94

Please sign in to comment.