Move replica banning to its own task #181

drdrsh · 2022-09-27T00:07:40Z

This PR moves banning logic to be owned an async task.

The async task will receives events from various sources that want to request a replica be banned.

This design allows for agents other than clients to ban replicas. This should allows us to introduce manual instance banning or banning based on periodic health checks.

We also replace the RWLock around banlists and use ArcSwap instead. This should make Banlist reads more or less lock-free

drdrsh · 2022-09-27T00:11:02Z

src/pool.rs

+        // We check if banning this address will result in all replica being banned
+        // If so, we unban all replicas instead
+        let pool_banned_addresses = self.ban_reporter.banlist(
+            self.settings.name.clone(),
+            self.settings.user.username.clone(),
+        );



I moved the check for "All replicas are banned" to be in ban to avoid making the super hot is_banned method slower. Under normal operations, we rarely ever call ban but we call is_banned all the time.

levkk · 2022-09-27T15:58:10Z

src/bans.rs

+
+    /// Send statistics to the task keeping track of stats.
+    fn send(&self, event: BanManagerEvent) {
+        let result = self.channel_to_worker.try_send(event.clone());


Let's make this send instead of try_send. We really don't want to ever lose this message.

levkk · 2022-09-27T16:32:50Z

src/bans.rs

+
+    pub fn report_failed_checkout(&self, pool_name: String, username: String, address: Address) {
+        let event = BanManagerEvent::Ban {
+            address: address,


We might want to create some kind of structure uniquely identifying a "pool", if we don't have one already. Passing around strings is fine, but they are arbitrarily large and error-prone.

levkk · 2022-10-09T18:32:58Z

src/bans.rs

+
+impl Default for BanManager {
+    fn default() -> BanManager {
+        let (channel_to_worker, _rx) = channel(1000);


Thinking out loud:

This channel can get saturated during incidents. Imagine we have 30k clients that together produce 50k QPS. If a replica goes down, all of a sudden we'll get 50k banning events which will quickly saturate this channel. The async worker task may not even get scheduled to ban the replica.

With an RW lock, one of the clients will ban, and all the others will read the banlist and see that it's banned immediately. RW locks are probably more expensive than ArcSwap (something we should actually validate), but they are more effective than channels I think (something we should investigate).

I think the middle ground can be a Mutex for banning, and an ArcSwap for reading the list, so only one task gets to ban at a time instead of a thundering herd (pseudo-code):

fn ban() { if (is_banned()) return; let guard = ban_mutex.lock(); let new_ban_list = ...; arc_swap.set(new_ban_list); } fn is_banned() { let ban_list = (*arc_swap); ban_list.contains_key(address); }

The thesis that this architecture allows for other agents than clients to ban seems shaky. Another agent, e.g. an admin command, can easily use the pool.ban and pool.unban methods, for example, and it would require very little changes to the code and no changes to the arch.

drdrsh added 2 commits September 26, 2022 12:28

wip

67a89c3

resolve

077977f

drdrsh commented Sep 27, 2022

View reviewed changes

add ban.rs

6d587bf

levkk reviewed Sep 27, 2022

View reviewed changes

drdrsh added 3 commits October 8, 2022 10:21

merge with main

5f78a53

Optimize ban checks

7ca8f53

Address comments

94a00de

drdrsh changed the title ~~Mostafa ban task~~ Move replica banning to its own task Oct 9, 2022

Add test for the unban all replicas case

27e246e

levkk reviewed Oct 9, 2022

View reviewed changes

drdrsh added 2 commits October 10, 2022 13:12

Remove channel

b56bad7

fmt

04c704b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move replica banning to its own task #181

Move replica banning to its own task #181

Uh oh!

drdrsh commented Sep 27, 2022 •

edited

Loading

Uh oh!

drdrsh Sep 27, 2022 •

edited

Loading

Uh oh!

levkk Sep 27, 2022

Uh oh!

levkk Sep 27, 2022

Uh oh!

levkk Oct 9, 2022 •

edited

Loading

Uh oh!

Uh oh!

Move replica banning to its own task #181

Are you sure you want to change the base?

Move replica banning to its own task #181

Uh oh!

Conversation

drdrsh commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drdrsh Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levkk Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

levkk Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

levkk Oct 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drdrsh commented Sep 27, 2022 •

edited

Loading

drdrsh Sep 27, 2022 •

edited

Loading

levkk Oct 9, 2022 •

edited

Loading