Skip to content

Commit be7bdfa

Browse files
committed
Update PyFlink kubenetes documentation adding FAQ
1 parent 9102f25 commit be7bdfa

File tree

1 file changed

+23
-14
lines changed

1 file changed

+23
-14
lines changed

k8s/README.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Runs PyFlink jobs on Kubernetes
1+
# Run PyFlink jobs on Kubernetes
22

33
In this example, we'd like to give a simple example to show how to run PyFlink jobs on Kubernetes in application mode.
44
It has been documented clearly in Flink's [official documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/) about how to work with Kubernetes.
@@ -57,8 +57,6 @@ Note: Make sure to publish the Docker image to a repository which is accessible
5757

5858
### Submit PyFlink jobs
5959

60-
#### Submit PyFlink on host machine
61-
6260
1) Download Flink distribution, e.g. for Flink 1.14.4, it's available in https://www.apache.org/dyn/closer.lua/flink/flink-1.14.4/flink-1.14.4-bin-scala_2.11.tgz
6361

6462
2) Extract it
@@ -71,11 +69,12 @@ tar zxvf flink-1.14.4-bin-scala_2.11.tgz
7169
cd flink-1.14.4
7270
./bin/flink run-application \
7371
--target kubernetes-application \
74-
--parallelism 8 \
72+
--parallelism 2 \
7573
-Dkubernetes.cluster-id=word-count \
76-
-Dtaskmanager.memory.process.size=4096m \
74+
-Djobmanager.memory.process.size=1024m \
75+
-Dtaskmanager.memory.process.size=1024m \
7776
-Dkubernetes.taskmanager.cpu=2 \
78-
-Dtaskmanager.numberOfTaskSlots=4 \
77+
-Dtaskmanager.numberOfTaskSlots=2 \
7978
-Dkubernetes.container.image=pyflink_wc:latest \
8079
-Dkubernetes.rest-service.exposed.type=ClusterIP \
8180
-py /opt/flink/usrlib/word_count.py
@@ -103,9 +102,8 @@ If everything runs normally, you should see outputs like the following:
103102
NAMESPACE NAME READY STATUS RESTARTS AGE
104103
default word-count-5f5d44b598-zg5z8 1/1 Running 0 90s
105104
default word-count-taskmanager-1-1 0/1 Pending 0 59s
106-
default word-count-taskmanager-1-2 0/1 Pending 0 59s
107105
```
108-
Among them, the JobManager runs in the pod `word-count-5f5d44b598-zg5z8 ` and the TaskManager runs in the pods `word-count-taskmanager-1-1` and `word-count-taskmanager-1-2`.
106+
Among them, the JobManager runs in the pod `word-count-5f5d44b598-zg5z8 ` and the TaskManager runs in the pod `word-count-taskmanager-1-1`.
109107
110108
If the pods are not running normally, you could check the logs of the pods, e.g. checking the log of the JM as following:
111109
```shell
@@ -123,16 +121,27 @@ kubectl port-forward service/word-count-rest 8081
123121
```
124122
Then you could access Flink's Web UI of the job via `http://127.0.0.1:8081`.
125123
126-
You could refer to Flink's [official documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) on more details.
124+
You could refer to Flink's [official documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) for more details.
127125

128126
### Cancel the jobs
129127

130-
You could either cancel the job through Flink's Web UI or via CLI commands as following:
128+
You could either cancel the job through Flink's Web UI or REST API.
129+
130+
## FAQ
131+
132+
### 0/1 nodes are available: 1 Insufficient memory
131133
134+
If the pods of the TaskManagers are always running in `PENDING` status after a long while, you could use the following command to see what happens:
132135
```shell
133-
# list jobs:
134-
./bin/flink list --target kubernetes-application -Dkubernetes.cluster-id=word-count
136+
kubectl describe pod word-count-taskmanager-1-1
137+
```
135138
136-
# cancel jobs:
137-
./bin/flink cancel --target kubernetes-application -Dkubernetes.cluster-id=word-count
139+
If see outputs like the following, it means that the memory of the kubernetes cluster is insufficient:
140+
```shell
141+
Events:
142+
Type Reason Age From Message
143+
---- ------ ---- ---- -------
144+
Warning FailedScheduling 16s (x2 over 16s) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
138145
```
146+
147+
You need to configure the kubernetes cluster with more memory.

0 commit comments

Comments
 (0)