|
| 1 | +# Runs PyFlink jobs on Kubernetes |
| 2 | + |
| 3 | +In this example, we'd like to give a simple example to show how to run PyFlink jobs on Kubernetes in application mode. |
| 4 | +It has been documented clearly in Flink's [official documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/) about how to work with Kubernetes. |
| 5 | +All the documentation there also applies for PyFlink jobs. It's strongly advised to read that documentation carefully before going through the following example. |
| 6 | + |
| 7 | +## Preparation |
| 8 | + |
| 9 | +### Setup Kubernetes cluster |
| 10 | + |
| 11 | +If there is no kubernetes cluster available for use, you need firstly set up it. You can take a look at [how to set up a Kubernetes cluster](https://kubernetes.io/docs/setup/) for more details. |
| 12 | + |
| 13 | +You can verify the permissions by running `kubectl auth can-i <list|create|edit|delete> pods`, e.g. |
| 14 | +```shell |
| 15 | +kubectl auth can-i create pods |
| 16 | +``` |
| 17 | + |
| 18 | +Then, you could run the following command: |
| 19 | +```shell |
| 20 | +kubectl get pods -A |
| 21 | +``` |
| 22 | +If the outputs are something like the following, it means that the Kubernetes cluster is running, and the kubectl is configured correctly, |
| 23 | +you could proceed to the next section: |
| 24 | +```shell |
| 25 | +kube-system coredns-f9fd979d6-96xql 1/1 Running 0 7m41s |
| 26 | +kube-system coredns-f9fd979d6-h9q5v 1/1 Running 0 7m41s |
| 27 | +kube-system etcd-docker-desktop 1/1 Running 0 6m44s |
| 28 | +kube-system kube-apiserver-docker-desktop 1/1 Running 0 6m47s |
| 29 | +kube-system kube-controller-manager-docker-desktop 1/1 Running 0 6m42s |
| 30 | +kube-system kube-proxy-94f22 1/1 Running 0 7m41s |
| 31 | +kube-system kube-scheduler-docker-desktop 1/1 Running 0 6m39s |
| 32 | +kube-system storage-provisioner 1/1 Running 0 7m6s |
| 33 | +kube-system vpnkit-controller 1/1 Running 0 7m5s |
| 34 | +``` |
| 35 | + |
| 36 | +### Build docker image with PyFlink installed |
| 37 | + |
| 38 | +It requires PyFlink installed on all the cluster nodes. Currently, it has still not provided official Flink docker images with PyFlink installed. |
| 39 | +You need to build it yourself as following. |
| 40 | + |
| 41 | +```shell |
| 42 | +docker build -t pyflink:1.14.4 -f docker/Dockerfile . |
| 43 | +``` |
| 44 | + |
| 45 | +## Execute PyFlink jobs |
| 46 | + |
| 47 | +### Creating a custom image containing the PyFlink job you want to execute and also the dependencies if needed |
| 48 | + |
| 49 | +In application mode, it requires that the user code is bundled together with the Flink image. See [Application Mode](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/#application-mode) for more details. |
| 50 | +So you need to build a custom image with PyFlink job code bundled in the image. |
| 51 | + |
| 52 | +```shell |
| 53 | +docker build -t pyflink_wc -f docker/Dockerfile.job . |
| 54 | +``` |
| 55 | + |
| 56 | +Note: Make sure to publish the Docker image to a repository which is accessible for the Kubernetes cluster if the Kubernetes cluster is not a local test cluster. |
| 57 | + |
| 58 | +### Submit PyFlink jobs |
| 59 | + |
| 60 | +#### Submit PyFlink on host machine |
| 61 | + |
| 62 | +1) Download Flink distribution, e.g. for Flink 1.14.4, it's available in https://www.apache.org/dyn/closer.lua/flink/flink-1.14.4/flink-1.14.4-bin-scala_2.11.tgz |
| 63 | + |
| 64 | +2) Extract it |
| 65 | +```shell |
| 66 | +tar zxvf flink-1.14.4-bin-scala_2.11.tgz |
| 67 | +``` |
| 68 | + |
| 69 | +3) Submit PyFlink jobs: |
| 70 | +```shell |
| 71 | +cd flink-1.14.4 |
| 72 | +./bin/flink run-application \ |
| 73 | + --target kubernetes-application \ |
| 74 | + --parallelism 8 \ |
| 75 | + -Dkubernetes.cluster-id=word-count \ |
| 76 | + -Dtaskmanager.memory.process.size=4096m \ |
| 77 | + -Dkubernetes.taskmanager.cpu=2 \ |
| 78 | + -Dtaskmanager.numberOfTaskSlots=4 \ |
| 79 | + -Dkubernetes.container.image=pyflink_wc:latest \ |
| 80 | + -Dkubernetes.rest-service.exposed.type=ClusterIP \ |
| 81 | + -py /opt/flink/usrlib/word_count.py |
| 82 | +``` |
| 83 | + |
| 84 | +Note: |
| 85 | +- More Kubernetes specific configurations could be found [here](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/#kubernetes) |
| 86 | +- You could override configurations set in `conf/flink-conf.yaml` via `-Dkey=value` |
| 87 | + |
| 88 | +If you see outputs as following, the job should have been submitted successfully: |
| 89 | +```shell |
| 90 | +2022-04-24 17:08:32,603 INFO org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124 |
| 91 | +2022-04-24 17:08:32,603 INFO org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122 |
| 92 | +2022-04-24 17:08:33,289 WARN org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note that Flink client operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. |
| 93 | +2022-04-24 17:08:33,302 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink application cluster word-count successfully, JobManager Web Interface: http://word-count-rest.default:8081 |
| 94 | +``` |
| 95 | +
|
| 96 | +You could verify the pod status as following: |
| 97 | +```shell |
| 98 | +kubectl get pods -A | grep word-count |
| 99 | +``` |
| 100 | +
|
| 101 | +If everything runs normally, you should see outputs like the following: |
| 102 | +```shell |
| 103 | +NAMESPACE NAME READY STATUS RESTARTS AGE |
| 104 | +default word-count-5f5d44b598-zg5z8 1/1 Running 0 90s |
| 105 | +default word-count-taskmanager-1-1 0/1 Pending 0 59s |
| 106 | +default word-count-taskmanager-1-2 0/1 Pending 0 59s |
| 107 | +``` |
| 108 | +Among them, the JobManager runs in the pod `word-count-5f5d44b598-zg5z8 ` and the TaskManager runs in the pods `word-count-taskmanager-1-1` and `word-count-taskmanager-1-2`. |
| 109 | +
|
| 110 | +If the pods are not running normally, you could check the logs of the pods, e.g. checking the log of the JM as following: |
| 111 | +```shell |
| 112 | +kubectl logs word-count-5f5d44b598-zg5z8 |
| 113 | +``` |
| 114 | +
|
| 115 | +See [Flink documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/cli/#submitting-pyflink-jobs) for more details on how to submit PyFlink jobs. |
| 116 | +
|
| 117 | +### Accessing Flink’s Web UI |
| 118 | +
|
| 119 | +Flink’s Web UI and REST endpoint can be exposed in several ways via the `kubernetes.rest-service.exposed.type` configuration option. |
| 120 | +Since it's set to `ClusterIP` in this example, the Flink’s Web UI could be accessed in the following way: |
| 121 | +```shell |
| 122 | +kubectl port-forward service/word-count-rest 8081 |
| 123 | +``` |
| 124 | +Then you could access Flink's Web UI of the job via `http://127.0.0.1:8081`. |
| 125 | +
|
| 126 | +You could refer to Flink's [official documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) on more details. |
| 127 | + |
| 128 | +### Cancel the jobs |
| 129 | + |
| 130 | +You could either cancel the job through Flink's Web UI or via CLI commands as following: |
| 131 | +
|
| 132 | +```shell |
| 133 | +# list jobs: |
| 134 | +./bin/flink list --target kubernetes-application -Dkubernetes.cluster-id=word-count |
| 135 | +
|
| 136 | +# cancel jobs: |
| 137 | +./bin/flink cancel --target kubernetes-application -Dkubernetes.cluster-id=word-count |
| 138 | +``` |
0 commit comments