Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff on reconnection attempts #403

Merged
merged 7 commits into from
Aug 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,68 @@ If you think that there is a potential case for you ending up queuing at least
wrap `sarus.send` function calls in a try/catch statement, so as to handle
those messages, should they occur.

### Exponential backoff

Configure exponential backoff like so:

```typescript
import Sarus from '@anephenix/sarus';

const sarus = new Sarus({
url: 'wss://ws.anephenix.com',
exponentialBackoff: {
// Exponential factor, here 2 will result in
// 1 s, 2 s, 4 s, and so on increasing delays
backoffRate: 2,
// Never wait more than 2000 seconds
backoffLimit: 2000,
},
});
```

When a connection attempt repeatedly fails, decreasing the delay
exponentially between each subsequent reconnection attempt is called
[Exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff). The
idea is that if a connection attempt failed after 1 second, and 2 seconds, then it is
not necessary to check it on the 3rd second, since the probability of a
reconnection succeeding on the third attempt is most likely not going up.
Therefore, increasing the delay between each attempt factors in the assumption
that a connection is not more likely to succeed by repeatedly probing in regular
intervals.

This decreases both the load on the client, as well as on the server. For
a client, fewer websocket connection attempts decrease the load on the client
and on the network connection. For the server, should websocket requests fail
within, then the load for handling repeatedly failing requests will fall
as well. Furthermore, the burden on the network will also be decreased. Should
for example a server refuse to accept websocket connections for one client,
then there is the possibility that other clients will also not be able to connect.

Sarus implements _truncated exponential backoff_, meaning that the maximum
reconnection delay is capped by another factor `backoffLimit` and will never
exceed it. The exponential backoff rate itself is determined by `backoffRate`.
If `backoffRate` is 2, then the delays will be 1 s, 2 s, 4 s, and so on.

The algorithm for reconnection looks like this in pseudocode:

```javascript
// Configurable
const backoffRate = 2;
// The maximum delay will be 400s
const backoffLimit = 400;
let notConnected = false;
let connectionAttempts = 1;
while (notConnected) {
const delay = Math.min(
Math.pow(connectionAttempts, backoffRate),
backoffLimit,
);
await delay(delay);
notConnected = tryToConnect();
connectionAttempts += 1;
}
```

### Advanced options

Sarus has a number of other options that you can pass to the client during
Expand Down
88 changes: 88 additions & 0 deletions __tests__/index/retryConnectionDelay.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
// File Dependencies
import Sarus from "../../src/index";
import { WS } from "jest-websocket-mock";
import { calculateRetryDelayFactor } from "../../src/index";
import type { ExponentialBackoffParams } from "../../src/index";

const url = "ws://localhost:1234";

Expand Down Expand Up @@ -61,3 +63,89 @@ describe("retry connection delay", () => {
});
});
});

describe("Exponential backoff delay", () => {
describe("with rate 2, backoffLimit 8000 ms", () => {
// The initial delay shall be 1 s
const initialDelay = 1000;
const exponentialBackoff: ExponentialBackoffParams = {
backoffRate: 2,
// We put the ceiling at exactly 8000 ms
backoffLimit: 8000,
};
const attempts: [number, number][] = [
[1000, 0],
[2000, 1],
[4000, 2],
[8000, 3],
[8000, 4],
];
it("will never be more than 8000 ms with rate set to 2", () => {
attempts.forEach(([delay, failedAttempts]) => {
expect(
calculateRetryDelayFactor(
exponentialBackoff,
initialDelay,
failedAttempts,
),
).toBe(delay);
});
});

it("should delay reconnection attempts exponentially", async () => {
// Somehow we need to convincen typescript here that "WebSocket" is
// totally valid. Could be because it doesn't assume WebSocket is part of
// global / the index key is missing
const webSocketSpy = jest.spyOn(global, "WebSocket" as any);
webSocketSpy.mockImplementation(() => {});
const setTimeoutSpy = jest.spyOn(global, "setTimeout");
const sarus = new Sarus({ url, exponentialBackoff });
expect(sarus.state).toStrictEqual({
kind: "connecting",
failedConnectionAttempts: 0,
});
let instance: WebSocket;
// Get the first WebSocket instance, and ...
[instance] = webSocketSpy.mock.instances;
if (!instance.onopen) {
throw new Error();
}
// tell the sarus instance that it is open, and ...
instance.onopen(new Event("open"));
if (!instance.onclose) {
throw new Error();
}
// close it immediately
instance.onclose(new CloseEvent("close"));
expect(sarus.state).toStrictEqual({
kind: "closed",
});

let cb: Sarus["connect"];
// We iteratively call sarus.connect() and let it fail, seeing
// if it reaches 8000 as a delay and stays there
attempts.forEach(([delay, failedAttempts]) => {
const call =
setTimeoutSpy.mock.calls[setTimeoutSpy.mock.calls.length - 1];
if (!call) {
throw new Error();
}
// Make sure that setTimeout was called with the correct delay
expect(call[1]).toBe(delay);
cb = call[0];
cb();
// Get the most recent WebSocket instance
instance =
webSocketSpy.mock.instances[webSocketSpy.mock.instances.length - 1];
if (!instance.onclose) {
throw new Error();
}
instance.onclose(new CloseEvent("close"));
expect(sarus.state).toStrictEqual({
kind: "connecting",
failedConnectionAttempts: failedAttempts + 1,
});
});
});
});
});
38 changes: 25 additions & 13 deletions __tests__/index/state.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,39 @@ describe("state machine", () => {

// In the beginning, the state is "connecting"
const sarus: Sarus = new Sarus(sarusConfig);
expect(sarus.state).toBe("connecting");
// Since Sarus jumps into connecting directly, 1 connection attempt is made
// right in the beginning, but none have failed
expect(sarus.state).toStrictEqual({
kind: "connecting",
failedConnectionAttempts: 0,
});

// We wait until we are connected, and see a "connected" state
await server.connected;
expect(sarus.state).toBe("connected");
expect(sarus.state.kind).toBe("connected");

// When the connection drops, the state will be "closed"
server.close();
await server.closed;
expect(sarus.state).toBe("closed");

// Restart server
server = new WS(url);
expect(sarus.state).toStrictEqual({
kind: "closed",
});

// We wait a while, and the status is "connecting" again
await delay(1);
expect(sarus.state).toBe("connecting");
// In the beginning, no connection attempts have been made, since in the
// case of a closed connection, we wait a bit until we try to connect again.
expect(sarus.state).toStrictEqual({
kind: "connecting",
failedConnectionAttempts: 0,
});

// We restart the server and let the Sarus instance reconnect:
server = new WS(url);

// When we connect in our mock server, we are "connected" again
await server.connected;
expect(sarus.state).toBe("connected");
expect(sarus.state.kind).toBe("connected");

// Cleanup
server.close();
Expand All @@ -46,23 +58,23 @@ describe("state machine", () => {

// Same initial state transition as above
const sarus: Sarus = new Sarus(sarusConfig);
expect(sarus.state).toBe("connecting");
expect(sarus.state.kind).toBe("connecting");
await server.connected;
expect(sarus.state).toBe("connected");
expect(sarus.state.kind).toBe("connected");

// The user can disconnect and the state will be "disconnected"
sarus.disconnect();
expect(sarus.state).toBe("disconnected");
expect(sarus.state.kind).toBe("disconnected");
await server.closed;

// The user can now reconnect, and the state will be "connecting", and then
// "connected" again
sarus.connect();
expect(sarus.state).toBe("connecting");
expect(sarus.state.kind).toBe("connecting");
await server.connected;
// XXX for some reason the test will fail without waiting 10 ms here
await delay(10);
expect(sarus.state).toBe("connected");
expect(sarus.state.kind).toBe("connected");
server.close();
});
});
Loading
Loading