Skip to content

Commit bbb65b4

Browse files
[RayService] Add RayService High Availability Test Doc (#1986)
1 parent 4199879 commit bbb65b4

File tree

3 files changed

+301
-0
lines changed

3 files changed

+301
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# RayService high availability
2+
RayService provides high availability (HA) to ensure services continue serving requests without failure during scaling up, scaling down, and upgrading the RayService configuration (zero-downtime upgrade).
3+
4+
## Quickstart
5+
6+
### Step 1: Create a Kubernetes cluster with Kind
7+
8+
```sh
9+
kind create cluster --image=kindest/node:v1.24.0
10+
```
11+
12+
### Step 2: Install the KubeRay operator
13+
Follow the instructions in [this document](/helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator, or follow the instructions in [DEVELOPMENT.md](/ray-operator/DEVELOPMENT.md) to install the nightly KubeRay operator.
14+
15+
### Step 3: Create a RayService and a locust cluster
16+
```sh
17+
# Path: kuberay/
18+
kubectl apply -f ./ray-operator/config/samples/ray-service.high-availability-locust.yaml
19+
kubectl get pod
20+
# NAME READY STATUS RESTARTS AGE
21+
# kuberay-operator-64b4fc5946-zbfqd 1/1 Running 0 72s
22+
# locust-cluster-head-6clr5 1/1 Running 0 38s
23+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Running 0 36s
24+
```
25+
The [ray-service.high-availability-locust.yaml](/ray-operator/config/samples/ray-service.high-availability-locust.yaml) has several Kubernetes objects:
26+
- A RayService with serve autoscaling and Pod autoscaling enabled.
27+
- A RayCluster functioning as locust cluster to simulate users sending requests.
28+
- A configmap with a locustfile sets user request levels: starts low, spikes, then drops.
29+
30+
### Step 4: Use Locust cluster to simulate users sending requests
31+
```sh
32+
# Open a new terminal and log into the locust cluster.
33+
kubectl exec -it $(kubectl get pods -o=name | grep locust-cluster-head) -- bash
34+
35+
# Install locust and download locust_runner.py.
36+
# locust_runner.py helps distribute the locust workers accross the RayCluster.
37+
pip install locust && wget https://raw.githubusercontent.com/ray-project/serve_workloads/main/microbenchmarks/locust_runner.py
38+
39+
# Start sending requests to the RayService.
40+
python locust_runner.py -f /locustfile/locustfile.py --host http://rayservice-ha-serve-svc:8000
41+
```
42+
43+
### Step 5: Verify high availability during scaling up and down
44+
45+
The locust cluster sends requests to the RayService, starting with a low number of requests, then spiking, and finally dropping. This will trigger the RayService to scale up and down. You can verify the high availability by observing the Ray Pod and the failure rate in the locust terminal.
46+
47+
```sh
48+
watch -n 1 "kubectl get pod"
49+
# Satge 1: Low request rate.
50+
# NAME READY STATUS RESTARTS AGE
51+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Running 0 78s
52+
# rayservice-ha-raycluster-pfh8b-worker-worker-rd22n 0/1 Init:0/1 0 9s
53+
54+
# Stage 2: High request rate
55+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Running 0 113s
56+
# rayservice-ha-raycluster-pfh8b-worker-worker-7thjv 0/1 Init:0/1 0 4s
57+
# rayservice-ha-raycluster-pfh8b-worker-worker-nt98j 0/1 Init:0/1 0 4s
58+
# rayservice-ha-raycluster-pfh8b-worker-worker-rd22n 1/1 Running 0 44s
59+
60+
# Stage 3: Low request rate
61+
# NAME READY STATUS RESTARTS AGE
62+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Running 0 3m38s
63+
# rayservice-ha-raycluster-pfh8b-worker-worker-7thjv 0/1 Terminating 0 109s
64+
# rayservice-ha-raycluster-pfh8b-worker-worker-nt98j 0/1 Terminating 0 109s
65+
# rayservice-ha-raycluster-pfh8b-worker-worker-rd22n 1/1 Running 0 2m29s
66+
```
67+
Let's describe how KubeRay and Ray ensure high availability during scaling, using the example provided.
68+
69+
In the above example, the RayService configuration is as follows:
70+
- Every node can have at most one serve replica.
71+
- The initial number of serve replicas is set to zero.
72+
- The head node will not be scheduled for any workloads to follow best practices.
73+
74+
With the above settings, when serve replicas scale up:
75+
1. KubeRay creates a new worker Pod. Since no serve replicas are currently running, the readiness probe for the new Pod fails. As a result, the endpoint is not added to the serve service.
76+
2. Ray then schedules a new serve replica to the newly created worker Pod. Once the serve replica is running, the readiness probe passes, and the endpoint is added to the serve service.
77+
78+
When serve replicas scale down:
79+
1. The proxy actor in the worker Pod that is scaling down changes its stage to `draining`. The readiness probe fails immediately, and the endpoint starts to be removed from the serve service. However, this process takes some time, so incoming requests are still redirected to this worker Pod for a short period.
80+
2. During the draining stage, the proxy actor can still redirect incoming requests. The proxy actor is only removed and changes to the `drained` stage when the following conditions are met:
81+
- There are no ongoing requests.
82+
- The minimum draining time has been reached, which can be controlled by an environmental variable: `RAY_SERVE_PROXY_MIN_DRAINING_PERIOD_S`.
83+
84+
Also, removing endpoints from the serve service does not affect the existing ongoing requests. All of the above ensures high availability.
85+
3. Once the worker Pod becomes idle, KubeRay removes it from the cluster.
86+
87+
> Note, the default value of `RAY_SERVE_PROXY_MIN_DRAINING_PERIOD_S` is 30s. You may change it to fit with your k8s cluster.
88+
89+
### Step 6: Verify high availability during upgrade
90+
The locust cluster will continue sending requests for 600s. Before the 600s is up, upgrade the RayService configuration by adding a new environment variable. This will trigger a rolling update. You can verify the high availability by observing the Ray Pod and the failure rate in the locust terminal.
91+
```sh
92+
kubectl patch rayservice rayservice-ha --type='json' -p='[
93+
{
94+
"op": "add",
95+
"path": "/spec/rayClusterConfig/headGroupSpec/template/spec/containers/0/env",
96+
"value": [
97+
{
98+
"name": "RAY_SERVE_PROXY_MIN_DRAINING_PERIOD_S",
99+
"value": "30"
100+
}
101+
]
102+
}
103+
]'
104+
105+
watch -n 1 "kubectl get pod"
106+
# stage 1: New head pod is created.
107+
# NAME READY STATUS RESTARTS AGE
108+
# rayservice-ha-raycluster-nhs7v-head-z6xkn 1/2 Running 0 4s
109+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Running 0 4m30s
110+
# rayservice-ha-raycluster-pfh8b-worker-worker-rd22n 1/1 Running 0 3m21s
111+
112+
# stage 2: Old head pod terminates after new head pod is ready and k8s service is fully updated.
113+
# NAME READY STATUS RESTARTS AGE
114+
# rayservice-ha-raycluster-nhs7v-head-z6xkn 2/2 Running 0 91s
115+
# rayservice-ha-raycluster-nhs7v-worker-worker-jplrp 0/1 Init:0/1 0 3s
116+
# rayservice-ha-raycluster-pfh8b-head-58xkr 2/2 Terminating 0 5m57s
117+
# rayservice-ha-raycluster-pfh8b-worker-worker-rd22n 1/1 Terminating 0 4m48s
118+
```
119+
When a new configuration is applied, the Kuberay operator always creates a new RayCluster with the new configuration and then removes the old RayCluster.
120+
Here are the details of the rolling update:
121+
1. KubeRay creates a new RayCluster with the new configuration. At this time, all requests are still being served by the old RayCluster.
122+
2. After the new RayCluster and the server app on it are ready, KubeRay updates the serve service to redirect the traffic to the new RayCluster. At this point, traffic is being served by both the old and new RayCluster as it takes time to update the k8s service.
123+
3. After the serve service is fully updated, KubeRay removes the old RayCluster. The traffic is now fully served by the new RayCluster.
124+
125+
### Step 7: Examine the locust results
126+
In your locust terminal, You will see the faile rate is 0.00%.
127+
```sh
128+
# fails |
129+
|-------------|
130+
0(0.00%) |
131+
|-------------|
132+
0(0.00%) |
133+
```
134+
135+
### Step 8: Clean up
136+
```sh
137+
kubectl delete -f ./ray-operator/config/samples/ray-service.high-availability-locust.yaml
138+
kind delete cluster
139+
```
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
apiVersion: ray.io/v1
2+
kind: RayService
3+
metadata:
4+
name: rayservice-ha
5+
spec:
6+
serveConfigV2: |
7+
proxy_location: EveryNode
8+
applications:
9+
- name: no_ops
10+
route_prefix: /
11+
import_path: microbenchmarks.no_ops:app_builder
12+
args:
13+
num_forwards: 0
14+
runtime_env:
15+
working_dir: https://github.com/ray-project/serve_workloads/archive/a2e2405f3117f1b4134b6924b5f44c4ff0710c00.zip
16+
deployments:
17+
- name: NoOp
18+
autoscaling_config:
19+
initial_replicas: 0
20+
min_replicas: 0
21+
max_replicas: 5
22+
upscale_delay_s: 3
23+
downscale_delay_s: 60
24+
metrics_interval_s: 2
25+
look_back_period_s: 10
26+
max_replicas_per_node: 1
27+
ray_actor_options:
28+
num_cpus: 1
29+
rayClusterConfig:
30+
rayVersion: '2.9.0' # should match the Ray version in the image of the containers
31+
enableInTreeAutoscaling: true
32+
autoscalerOptions:
33+
idleTimeoutSeconds: 1
34+
######################headGroupSpecs#################################
35+
# Ray head pod template.
36+
headGroupSpec:
37+
# The `rayStartParams` are used to configure the `ray start` command.
38+
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
39+
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
40+
rayStartParams:
41+
num-cpus: "0"
42+
dashboard-host: '0.0.0.0'
43+
#pod template
44+
template:
45+
spec:
46+
containers:
47+
- name: ray-head
48+
image: rayproject/ray:2.9.0
49+
resources:
50+
limits:
51+
cpu: 2
52+
memory: 2Gi
53+
requests:
54+
cpu: 2
55+
memory: 2Gi
56+
ports:
57+
- containerPort: 6379
58+
name: gcs-server
59+
- containerPort: 8265 # Ray dashboard
60+
name: dashboard
61+
- containerPort: 10001
62+
name: client
63+
- containerPort: 8000
64+
name: serve
65+
workerGroupSpecs:
66+
# the pod replicas in this group typed worker
67+
- replicas: 0
68+
minReplicas: 0
69+
maxReplicas: 5
70+
groupName: worker
71+
# The `rayStartParams` are used to configure the `ray start` command.
72+
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
73+
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
74+
rayStartParams: {}
75+
#pod template
76+
template:
77+
spec:
78+
containers:
79+
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
80+
image: rayproject/ray:2.9.0
81+
lifecycle:
82+
preStop:
83+
exec:
84+
command: ["/bin/sh","-c","ray stop"]
85+
resources:
86+
limits:
87+
cpu: 1
88+
memory: 2Gi
89+
requests:
90+
cpu: 1
91+
memory: 2Gi
92+
---
93+
apiVersion: ray.io/v1
94+
kind: RayCluster
95+
metadata:
96+
labels:
97+
controller-tools.k8s.io: "1.0"
98+
name: locust-cluster
99+
spec:
100+
rayVersion: '2.9.0'
101+
headGroupSpec:
102+
rayStartParams:
103+
dashboard-host: '0.0.0.0'
104+
template:
105+
spec:
106+
containers:
107+
- name: ray-head
108+
image: rayproject/ray:2.9.0
109+
resources:
110+
limits:
111+
cpu: 3
112+
memory: 4Gi
113+
requests:
114+
cpu: 3
115+
memory: 4Gi
116+
ports:
117+
- containerPort: 6379
118+
name: gcs-server
119+
- containerPort: 8265
120+
name: dashboard
121+
- containerPort: 10001
122+
name: client
123+
volumeMounts:
124+
- mountPath: /locustfile
125+
name: locustfile-volume
126+
volumes:
127+
- name: locustfile-volume
128+
configMap:
129+
name: locustfile-config
130+
---
131+
apiVersion: v1
132+
kind: ConfigMap
133+
metadata:
134+
name: locustfile-config
135+
data:
136+
locustfile.py: |
137+
from locust import FastHttpUser, task, constant, LoadTestShape
138+
import os
139+
140+
class ConstantUser(FastHttpUser):
141+
wait_time = constant(float(os.environ.get("LOCUS_WAIT_TIME", "1")))
142+
network_timeout = None
143+
connection_timeout = None
144+
@task
145+
def hello_world(self):
146+
self.client.post("/")
147+
148+
# Derived from https://github.com/locustio/locust/blob/master/examples/custom_shape/stages.py
149+
class StagesShape(LoadTestShape):
150+
stages = [
151+
{"duration": 30, "users": 10, "spawn_rate": 10},
152+
{"duration": 60, "users": 120, "spawn_rate": 10},
153+
{"duration": 600, "users": 10, "spawn_rate": 10},
154+
]
155+
def tick(self):
156+
run_time = self.get_run_time()
157+
for stage in self.stages:
158+
if run_time < stage["duration"]:
159+
tick_data = (stage["users"], stage["spawn_rate"])
160+
return tick_data
161+
return None

tests/test_sample_raycluster_yamls.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ def parse_args():
6565
'ray-cluster.tpu-v4-singlehost.yaml': 'Skip this test because it requires TPU resources.',
6666
'ray-cluster.tpu-v4-multihost.yaml' : 'Skip this test because it requires TPU resources',
6767
'ray-cluster.gke-bucket.yaml': 'Skip this test because it requires GKE and k8s service accounts.',
68+
'ray-service.high-availability-locust.yaml': 'Skip this test because the RayCluster here is only used for testing RayService.',
6869
}
6970

7071
rs = RuleSet([HeadPodNameRule(), EasyJobRule(), HeadSvcRule()])

0 commit comments

Comments
 (0)