You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation of Pegasus, reading from secondary replicas can lead to inconsistencies, so only primary replicas are used for reads. However, in certain scenarios, such as load balancing or hotspot writes can cause instability on the primary replica. To address this, we aim to enable reads from secondary replicas when the primary is unstable. While this approach sacrifices some strong consistency, it helps to alleviate long-tail latency in read requests and improves overall system availability. The **backup request** mechanism is designed to facilitate this functionality.
6
+
In the current implementation of Pegasus, reading from secondary replicas can cause inconsistencies, so Pegasus defaults reading from primary replicas. However, in certain situations (such as load balancing, hotspot writes, etc.), the primary can often become unstable. Therefore, we hope to read from the secondary when the primary is unstable, sacrificing some strong consistencyto reduce the tail latency of read requests and improve system availability. Backup requestis designed to achieve this functionality.
7
7
8
8
# Design and Implementation
9
-
Implementing of backup requests is relatively straightforward. For read operations (write operations currently do not support backup requests), when a client sends a request to the primary, it will wait for a specified delay period (typically p999). If no response is received within this time, the client will randomly select a secondary replica and send a backup request. The first response received will be used.
10
9
11
-
We recommend using p999 as the delay for sending secondary requests, as the purpose of the backup request operation is to eliminate long-tail latency rather than to improve cluster performance. Setting this value too low can result in an overwhelming number of backup requests, thereby significantly increasing the overall system load. For example, if the delay is set to p50, 50% of the requests would be sent to secondary replicas, causing a 50% increase in system load.
10
+
The implementation principle of the backup request is relatively simple: For read operations (currently, write operations do not support backup requests), when the client sends a request to the primary, if the response has not been returned after a certain delay (usually p999), a secondary is randomly selected and a backup request is sent to it. Finally, the fastest returned response is processed.
12
11
13
-
# How to Use
14
-
In Pegasus Java Client v2.0.0, we introduced an interface that allows users to enable the backup request feature for a specific table. The method is defined as follows:
12
+
We recommend choosing p999 as the delay time for sending secondary requests because the backup request operation is intended to eliminate tail latency, not to improve cluster performance. If the value is set too low, the large number of backup requests will increase the cluster pressure (assuming p50 is chosen as the delay, then 50% of the requests will send requests to the secondary, and the system load will increase by 50%).
15
13
14
+
# How to Use
15
+
In Pegasus Java client v2.0.0, we have added an interface through which the backup request function of a specific table can be enabled. The implementation is as follows:
16
16
```java
17
17
publicPegasusTableInterface openTable(String tableName, int backupRequestDelayMs) throws PException;
18
18
```
19
19
20
-
Compared to the previous version of the `openTable` interface, we’ve added the `backupRequestDelayMs` parameter. This parameter defines the delay time in milliseconds: if a request sent to the primary replica does not receive a response within `backupRequestDelayMs`, a backup request will be sent to a secondary replica. Notice that setting `backupRequestDelayMs <= 0` disables the backup request feature.
20
+
Compared to the old version of the `openTable` interface, we have added a `backupRequestDelayMs` parameter. This parameter is the delay mentioned above, i.e.: sending a request to the primary, if the response has not returned after `backupRequestDelayMs` milliseconds, then send a backup request to the secondary. Note that `backupRequestDelayMs <= 0` means disabling the backup request feature.
21
+
22
+
In addition, in the old version of the `openTable` interface, the backup request feature is disabled by default.
21
23
22
-
In previous versions of the openTable interface, the backup request mechanism was disabled by default.
24
+
# Benchmark
23
25
24
-
# Performance Testing
25
-
The table below compares the performance between enabling and disabling backup requests. We used the p999 latency of read without backup requests (138 ms) as the delay for triggering backup requests. The data shows that enabling backup requests has **no significant impact** on the p999 latency for `get` requests, but the p9999 latency is **reduced by several times**.
26
+
The following table shows the performance comparison of whether the backup request is enabled. Here we selected the p999 time of read requests when the backup request is not enabled as the delay time for the backup request (138ms). The data shows that after enabling the backup request, the p999 latency of get requests **remains almost unchanged**, while the p9999 latency is **reduced by several times**.
26
27
27
-
Additionally, since the delay is set to the p999 value, approximately one out of every thousand requests triggers a backup request. This results in an additional request load (i.e., the overhead of enabling backup requests) of approximately 0.1%. Similarly, setting the `backupRequestDelayMs` to p99 can further reduce the p999 latency, which may increase the additional read request load by around 1%.
28
+
In addition, since the delay time is set to p999 time, about 1 out of 1000 requests will send a backup request, so the proportion of additional request volume (i.e., the additional overhead of enabling the backup request) is about 0.1%. By analogy, if you want to reduce P999 latency, you can set `backupRequestDelayMs` to P99 latency, which will increase the additional read traffic by 1%.
0 commit comments