Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff on reconnection attempts #403

Merged
merged 7 commits into from
Aug 11, 2024

Conversation

justuswilhelm
Copy link
Contributor

@justuswilhelm justuswilhelm commented Jan 2, 2024

DEPENDS ON #402

DEPENDS ON implementation of a state machine, or other means to track if connection is failing.

The problem: reconnecting with a default delay of 1s can result in a lot of additional traffic on a server.

Scenario:

  1. N users are connecting to an endpoint using websockets with sarus
  2. All N users lose authorization to access said websocket
  3. All N users will now try to reconnect to this endpoint with a 403 returning every second
  4. Server load is increased

The solution is to implement exponential backoff. Please refer to the readme for an exact description.

Question to @paulbjensen: If a connection is failing, I need to track the amount of failed connection attempts. To do that, I first need to know if a connection is failing. A state machine is usually a good idea to track the current state of a connection. I will prepare another to PR to lay the groundwork for this PR here. Do you have any other idea on how to implement this/is there anything else I can use in the code?

@paulbjensen
Copy link
Member

Exponential backoff if definitely a good idea. I'm all for this.

My initial thoughts are that that there needs to be some kind of internal log within the Sarus class that tracks all the actions taken by the instance (attempt to connect, connect success, connection closes, etc).

Then, the log has the historical context of what has been happening and when. We then have an option to enable exponential backoff in Sarus, and when this is enabled, it will use the log to analyse the connection history and determine how much to wait until the next connection retry.

In theory it just needs to modify the value of retryConnectionDelay.

I also realise that we might want to have fine-grained control over the exponential back-off strategy (e.g. how much to increase the wait by, whether to set an upper limit that the exponential back-off wait can reach).

Good feature to add 👍

@justuswilhelm
Copy link
Contributor Author

Thanks for the feedback 😄

Yes, keeping track of the internal state is my line of thinking as well. #404 is supposed to achieve this by modelling Sarus as a FSM.

Regarding the configurability: Backoff rate and upper limit are configurable and required at the moment. It could be useful to make some of these parameters not required in the future, but in order to get the feature out there, working and tested it's useful to have a first working version out soon. I intend to use this feature as soon as possible in Projectify (https://github.com/jwp-consulting/projectify-frontend)

@paulbjensen
Copy link
Member

@justuswilhelm hi, sorry for taking so long to get back on this. I'm happy for this PR to get merged. And I will be taking a look at the other PR now.

@justuswilhelm justuswilhelm force-pushed the exp-backoff branch 6 times, most recently from 037e01b to 568b9de Compare April 20, 2024 09:30
@justuswilhelm
Copy link
Contributor Author

Hi @paulbjensen, thanks for merging my other PRs :)

Can you please review this PR? Divided over a few commits I did the following:

  • Change state type to hold additional values
  • Add the backoff calculation and test it
  • Track failed connection attempts, use them for exp. calculation if enabled

The state tracking is a bit complicated since there was already a state machine lurking in Sarus. The functions connect(), reconnect(), disconnect(), onclose(), onopen() already serve as springboards for state transitions.

I hope the test coverage is sufficient, please let me know if you have any feedback.

By turning state from simple string literals to full objects, we can add
annotations that are specific to one state.

Use case: When retrying connections, we could just add a
connectionAttempt?: number; property, but then we would have to check if
it is undefined or not, and if we are not reconnecting currently and
connectionAttempt? is set, does that mean the Sarus client entered an
invalid state?

By guaranteeing certain properties to only ever be present when the
state property has the right kind, we can avoid a whole class of bugs
just by relying on the type system.
Previously, it said "Function", but this was probably meant in reference
to the callbacks called further below. Refering to the MDN docs on
WebSocket Interfaces [^1], one can see that there are three possible events:

- Event,
- CloseEvent, and
- MessageEvent

[^1][https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API#interfaces]
Allow configuring and enabling exponential backoff for failing
connections, instead of relying on reconnections in regular intervals.

Exponential backoff can be configured in order to lighten server load if
multiple clients are reconnecting.

The new configuration parameter is called `exponentialBackoff` and
requires two parameters `backoffRate` and `backoffLimit`.
This is used to calculate the exponentially increasing delay. The
function is used to calculate the reconnection delay - but does not keep
track of the current reconnection attempt. Keeping track of the current
reconnection attempt is reserved for the next few commits.
Track how many times Sarus has already tried to (re)connect.
When exponential backoff is enabled, use the stored number of connection
attempts to calculate the exponential delay.
It turns out we don't need it here after all
@justuswilhelm justuswilhelm changed the title Work in progress: Exponential backoff on reconnection attempts Exponential backoff on reconnection attempts Jun 1, 2024
@justuswilhelm
Copy link
Contributor Author

I've rebased the master branch on this branch, as there was a merge conflict in src/index.ts

@justuswilhelm justuswilhelm marked this pull request as ready for review July 22, 2024 00:31
@justuswilhelm
Copy link
Contributor Author

Hi, sorry if I catch you during a busy time. Please let me know if anything else is needed to make this PR mergable. Thanks again for all the input so far.

@paulbjensen
Copy link
Member

Sorry, I'll review this PR this weekend.

@paulbjensen paulbjensen merged commit 0ca73d9 into anephenix:master Aug 11, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants