-
Notifications
You must be signed in to change notification settings - Fork 221
Move replica banning to its own task #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/pool.rs
Outdated
// We check if banning this address will result in all replica being banned | ||
// If so, we unban all replicas instead | ||
let pool_banned_addresses = self.ban_reporter.banlist( | ||
self.settings.name.clone(), | ||
self.settings.user.username.clone(), | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the check for "All replicas are banned" to be in ban
to avoid making the super hot is_banned
method slower. Under normal operations, we rarely ever call ban
but we call is_banned
all the time.
src/bans.rs
Outdated
|
||
/// Send statistics to the task keeping track of stats. | ||
fn send(&self, event: BanManagerEvent) { | ||
let result = self.channel_to_worker.try_send(event.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this send
instead of try_send
. We really don't want to ever lose this message.
src/bans.rs
Outdated
|
||
pub fn report_failed_checkout(&self, pool_name: String, username: String, address: Address) { | ||
let event = BanManagerEvent::Ban { | ||
address: address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to create some kind of structure uniquely identifying a "pool", if we don't have one already. Passing around strings is fine, but they are arbitrarily large and error-prone.
src/bans.rs
Outdated
|
||
impl Default for BanManager { | ||
fn default() -> BanManager { | ||
let (channel_to_worker, _rx) = channel(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud:
This channel can get saturated during incidents. Imagine we have 30k clients that together produce 50k QPS. If a replica goes down, all of a sudden we'll get 50k banning events which will quickly saturate this channel. The async worker task may not even get scheduled to ban the replica.
With an RW lock, one of the clients will ban, and all the others will read the banlist and see that it's banned immediately. RW locks are probably more expensive than ArcSwap (something we should actually validate), but they are more effective than channels I think (something we should investigate).
I think the middle ground can be a Mutex for banning, and an ArcSwap for reading the list, so only one task gets to ban at a time instead of a thundering herd (pseudo-code):
fn ban() {
if (is_banned()) return;
let guard = ban_mutex.lock();
let new_ban_list = ...;
arc_swap.set(new_ban_list);
}
fn is_banned() {
let ban_list = (*arc_swap);
ban_list.contains_key(address);
}
The thesis that this architecture allows for other agents than clients to ban seems shaky. Another agent, e.g. an admin command, can easily use the pool.ban
and pool.unban
methods, for example, and it would require very little changes to the code and no changes to the arch.
This PR moves banning logic to be owned an async task.
The async task will receives events from various sources that want to request a replica be banned.
This design allows for agents other than clients to ban replicas. This should allows us to introduce manual instance banning or banning based on periodic health checks.
We also replace the RWLock around banlists and use ArcSwap instead. This should make Banlist reads more or less lock-free