Skip to content

Commit 285a2a0

Browse files
authored
add pod scheduling example (#210)
Co-authored-by: Takuto Suzuki <[email protected]>
1 parent 82b7f8b commit 285a2a0

File tree

2 files changed

+273
-0
lines changed

2 files changed

+273
-0
lines changed

scheduling/pod-scheduling/README.md

+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Configure Workloads Scheduling
2+
3+
In this workflow scenario, you'll deploy Confluent Platform to a multi-zone Kubernetes cluster, with Pod's Workload Scheduling configured.
4+
5+
One of the things you can do to get the optimal performance out of Confluent components is to control how the component pods are scheduled on Kubernetes nodes. For example, you can configure pods not to be scheduled on the same node as other resource intensive applications, pods to be scheduled on dedicated nodes, or pods to be scheduled on the nodes with the most suitable hardware.
6+
7+
Read more about Pod Scheduling in [Confluent Docs][1].
8+
9+
10+
## Set the current tutorial directory
11+
12+
Set the tutorial directory for this tutorial under the directory you downloaded
13+
the tutorial files:
14+
15+
```
16+
export TUTORIAL_HOME=<Tutorial directory>/scheduling/pod-scheduling
17+
```
18+
19+
## Deploy Confluent for Kubernetes
20+
21+
This workflow scenario assumes you are using the namespace `confluent`.
22+
23+
Set up the Helm Chart:
24+
25+
```
26+
helm repo add confluentinc https://packages.confluent.io/helm
27+
```
28+
29+
Install Confluent For Kubernetes using Helm:
30+
31+
```
32+
helm upgrade --install operator confluentinc/confluent-for-kubernetes -n confluent
33+
```
34+
35+
Check that the Confluent For Kubernetes pod comes up and is running:
36+
37+
```
38+
kubectl get pods
39+
```
40+
41+
## Configure
42+
43+
### Set Required or Preferred
44+
45+
Two types of affinity rules are supported:
46+
47+
1. `requiredDuringSchedulingIgnoredDuringExecution`
48+
2. `preferredDuringSchedulingIgnoredDuringExecution`
49+
50+
In `confluent-platform.yml`, `requiredDuringSchedulingIgnoredDuringExecution` is used.
51+
52+
### Set the labelSelector
53+
54+
`labelSelector` is used to find matching pods by label. This can be found by querying the pod:
55+
56+
```
57+
> kubectl describe pod/<pod-name>
58+
59+
# In the Labels section, key-value names are found.
60+
61+
Labels: app=zookeeper
62+
clusterId=confluent
63+
confluent-platform=true
64+
controller-revision-hash=zookeeper-75c664dbbf
65+
platform.confluent.io/type=zookeeper
66+
statefulset.kubernetes.io/pod-name=zookeeper-0
67+
type=zookeeper
68+
```
69+
70+
To only apply this rule to zookeeper, set
71+
72+
```
73+
- labelSelector:
74+
matchExpressions:
75+
- key: app
76+
operator: In
77+
values:
78+
- zookeeper
79+
```
80+
81+
82+
### Get the topologyKey
83+
84+
The topology key is the label of the cluster node to specify the nodes to co-locate or avoid scheduling pods. This depends on the Kubernetes flavors such as cloud providers. To get the correct value, run:
85+
86+
```
87+
> kubectl get node
88+
89+
NAME STATUS ROLES AGE VERSION
90+
aks-agentpool-18842733-vmss000001 Ready agent 7h59m v1.24.6
91+
aks-agentpool-18842733-vmss000003 Ready agent 154m v1.24.6
92+
aks-agentpool-18842733-vmss000004 Ready agent 153m v1.24.6
93+
```
94+
95+
And pick a node and describe it:
96+
97+
```
98+
> kubectl describe node/aks-agentpool-18842733-vmss000001
99+
100+
Labels: agentpool=agentpool
101+
beta.kubernetes.io/arch=amd64
102+
beta.kubernetes.io/instance-type=Standard_A8m_v2
103+
beta.kubernetes.io/os=linux
104+
failure-domain.beta.kubernetes.io/region=japaneast
105+
failure-domain.beta.kubernetes.io/zone=japaneast-2
106+
kubernetes.azure.com/agentpool=agentpool
107+
kubernetes.azure.com/cluster=MC_taku-test_taku-test_japaneast
108+
kubernetes.azure.com/kubelet-identity-client-id=f8663bd6-a676-4b9f-8cb7-f57bc43a92a7
109+
kubernetes.azure.com/mode=system
110+
kubernetes.azure.com/node-image-version=AKSUbuntu-1804containerd-2023.01.10
111+
kubernetes.azure.com/os-sku=Ubuntu
112+
kubernetes.azure.com/role=agent
113+
kubernetes.azure.com/storageprofile=managed
114+
kubernetes.azure.com/storagetier=Standard_LRS
115+
kubernetes.io/arch=amd64
116+
kubernetes.io/hostname=aks-agentpool-18842733-vmss000001
117+
kubernetes.io/os=linux
118+
kubernetes.io/role=agent
119+
node-role.kubernetes.io/agent=
120+
node.kubernetes.io/instance-type=Standard_A8m_v2
121+
storageprofile=managed
122+
storagetier=Standard_LRS
123+
topology.disk.csi.azure.com/zone=japaneast-2
124+
topology.kubernetes.io/region=japaneast
125+
topology.kubernetes.io/zone=japaneast-2
126+
```
127+
128+
Select a key to use, for example, `topology.kubernetes.io/zone`.
129+
130+
131+
## Deploy Kafka Components
132+
133+
To deploy Kafka components:
134+
```
135+
kubectl apply -f $TUTORIAL_HOME/confluent-platform.yml
136+
```
137+
138+
If the changes aren't applied, run `kubectl delete pod/<pod-name>` to restart individual pods.
139+
140+
## Tear Down
141+
142+
To tear down the components, run:
143+
```
144+
kubectl delete -f $TUTORIAL_HOME/confluent-platform.yml
145+
```
146+
147+
## Troubleshooting
148+
149+
1. Pod is in a Pending state.
150+
151+
Run `kubectl describe pod/<pod-name>` and see why the correct node is not assigned.
152+
153+
2. Pod is not assigned to node due to `volume node affinity conflict`.
154+
155+
Check the Persistent Volume and Persistent Volume Claim. (`kubectl get pv` or `kubectl get pvc`) Most likely multiple volumes already belong to the same node or zone. A quick forceful fix is by tearing down the YAML file, removing pods, removing PVC, removing PVs, and then deploy again.
156+
157+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
apiVersion: platform.confluent.io/v1beta1
3+
kind: Zookeeper
4+
metadata:
5+
name: zookeeper
6+
namespace: confluent
7+
spec:
8+
replicas: 3
9+
image:
10+
application: confluentinc/cp-zookeeper:7.3.0
11+
init: confluentinc/confluent-init-container:2.5.0
12+
dataVolumeCapacity: 10Gi
13+
logVolumeCapacity: 10Gi
14+
podTemplate:
15+
affinity:
16+
podAntiAffinity:
17+
requiredDuringSchedulingIgnoredDuringExecution:
18+
- labelSelector:
19+
matchExpressions:
20+
- key: app
21+
operator: In
22+
values:
23+
- zookeeper
24+
topologyKey: topology.kubernetes.io/zone
25+
resources:
26+
requests:
27+
cpu: 100m
28+
memory: 256Mi
29+
podSecurityContext:
30+
fsGroup: 1000
31+
runAsUser: 1000
32+
runAsNonRoot: true
33+
---
34+
apiVersion: platform.confluent.io/v1beta1
35+
kind: Kafka
36+
metadata:
37+
name: kafka
38+
namespace: confluent
39+
spec:
40+
replicas: 3
41+
image:
42+
application: confluentinc/cp-server:7.3.0
43+
init: confluentinc/confluent-init-container:2.5.0
44+
dataVolumeCapacity: 100Gi
45+
metricReporter:
46+
enabled: true
47+
podTemplate:
48+
affinity:
49+
podAntiAffinity:
50+
requiredDuringSchedulingIgnoredDuringExecution:
51+
- labelSelector:
52+
matchExpressions:
53+
- key: app
54+
operator: In
55+
values:
56+
- kafka
57+
topologyKey: topology.kubernetes.io/zone
58+
resources:
59+
requests:
60+
cpu: 200m
61+
memory: 512Mi
62+
podSecurityContext:
63+
fsGroup: 1000
64+
runAsUser: 1000
65+
runAsGroup: 1000
66+
runAsNonRoot: true
67+
---
68+
apiVersion: platform.confluent.io/v1beta1
69+
kind: Connect
70+
metadata:
71+
name: connect
72+
namespace: confluent
73+
spec:
74+
replicas: 1
75+
image:
76+
application: confluentinc/cp-server-connect:7.3.0
77+
init: confluentinc/confluent-init-container:2.5.0
78+
dependencies:
79+
kafka:
80+
bootstrapEndpoint: kafka:9071
81+
---
82+
apiVersion: platform.confluent.io/v1beta1
83+
kind: KsqlDB
84+
metadata:
85+
name: ksqldb
86+
namespace: confluent
87+
spec:
88+
replicas: 1
89+
image:
90+
application: confluentinc/cp-ksqldb-server:7.3.0
91+
init: confluentinc/confluent-init-container:2.5.0
92+
dataVolumeCapacity: 10Gi
93+
---
94+
apiVersion: platform.confluent.io/v1beta1
95+
kind: ControlCenter
96+
metadata:
97+
name: controlcenter
98+
namespace: confluent
99+
spec:
100+
replicas: 1
101+
image:
102+
application: confluentinc/cp-enterprise-control-center:7.3.0
103+
init: confluentinc/confluent-init-container:2.5.0
104+
dataVolumeCapacity: 10Gi
105+
dependencies:
106+
schemaRegistry:
107+
url: http://schemaregistry.confluent.svc.cluster.local:8081
108+
ksqldb:
109+
- name: ksqldb
110+
url: http://ksqldb.confluent.svc.cluster.local:8088
111+
connect:
112+
- name: connect
113+
url: http://connect.confluent.svc.cluster.local:8083
114+
podTemplate:
115+
resources:
116+
requests:

0 commit comments

Comments
 (0)