|
| 1 | +# 通过Kepler Operator在Kind上安装 |
| 2 | + |
| 3 | +## 需求: |
| 4 | + |
| 5 | +在开始前请确认您已经安装了: |
| 6 | + |
| 7 | +- `kubectl` |
| 8 | +- 下载了`kepler-operator`[repository](https://github.com/sustainable-computing-io/kepler-operator) |
| 9 | +- 目标k8s集群。您可以使用Kind来简单构建一个本地k8s集群来体验本教程。[local cluster for testing](#run-a-kind-cluster-locally), 或直接在您远端的k8s集群执行。注意您的controller将会自动使用当前的kubeconfig配置文件。您可以通过`kubectl cluster-info`来查看。 |
| 10 | +- 有`kubeadmin` 或者 `cluster-admin` 权限的用户。 |
| 11 | + |
| 12 | +### 启动一个本地kind集群 |
| 13 | + |
| 14 | +``` sh |
| 15 | +cd kepler-operator |
| 16 | +make cluster-up CLUSTER_PROVIDER='kind' CI_DEPLOY=true GRAFANA_ENABLE=true |
| 17 | + |
| 18 | +kubectl get pods -n monitoring |
| 19 | + |
| 20 | +grafana-b88df6989-km7c6 1/1 Running 0 48m |
| 21 | +prometheus-k8s-0 2/2 Running 0 46m |
| 22 | +prometheus-operator-6bd88c8bdf-9f69h 2/2 Running 0 48m |
| 23 | +``` |
| 24 | + |
| 25 | +## 启动kepler-operator |
| 26 | +- 您可以通过quay.io上的image来部署kepler-operator. |
| 27 | + |
| 28 | +```sh |
| 29 | +make deploy IMG=quay.io/sustainable_computing_io/kepler-operator:latest |
| 30 | +kubectl config set-context --current --namespace=monitoring |
| 31 | +kubectl apply -k config/samples/ |
| 32 | +``` |
| 33 | +- 通过`kubectl get pods -n monitoring`命令来验证`kepler-exporter`pod的部署情况。 |
| 34 | + |
| 35 | + |
| 36 | +## 设置Grafana Dashboard |
| 37 | + |
| 38 | +使用`GRAFANA_ENABLE=true` 来配置`kube-prometheus`在命名空间`monitoring`上的部署. |
| 39 | +通过以下命令来访问位于3000端口的grafana界面。 |
| 40 | + |
| 41 | +```sh |
| 42 | +kubectl port-forward svc/grafana 3000:3000 -n monitoring |
| 43 | +``` |
| 44 | + |
| 45 | +>并通过以下域名访问[http://localhost:3000](http://localhost:3000) |
| 46 | +
|
| 47 | +### Service Monitor |
| 48 | + |
| 49 | +让`kube-prometheus` 使用 `kepler-exporter` 服务端口进行监控,您需要配置service monitor. |
| 50 | + |
| 51 | +> Note: 默认情况下`kube-prometheus` 不会捕捉`monitoring`命名空间之外的服务. 如果您的kepler部署在`monitoring`空间之外[请看考以下步骤](#scrape-all-namespaces). |
| 52 | +
|
| 53 | +``` |
| 54 | +kubectl apply -n monitoring -f - << |
| 55 | +apiVersion: monitoring.coreos.com/v1 |
| 56 | +kind: ServiceMonitor |
| 57 | +metadata: |
| 58 | + labels: |
| 59 | + app.kubernetes.io/component: exporter |
| 60 | + app.kubernetes.io/name: kepler-exporter |
| 61 | + sustainable-computing.io/app: kepler |
| 62 | + name: monitor-kepler-exporter |
| 63 | +spec: |
| 64 | + endpoints: |
| 65 | + - interval: 3s |
| 66 | + port: http |
| 67 | + relabelings: |
| 68 | + - action: replace |
| 69 | + regex: (.*) |
| 70 | + replacement: $1 |
| 71 | + sourceLabels: |
| 72 | + - __meta_kubernetes_pod_node_name |
| 73 | + targetLabel: instance |
| 74 | + scheme: http |
| 75 | + jobLabel: app.kubernetes.io/name |
| 76 | + namespaceSelector: |
| 77 | + matchNames: |
| 78 | + any: true |
| 79 | + selector: |
| 80 | + matchLabels: |
| 81 | + app.kubernetes.io/component: exporter |
| 82 | + app.kubernetes.io/name: kepler-exporter |
| 83 | +EOF |
| 84 | +``` |
| 85 | + |
| 86 | +### Grafana Dashboard |
| 87 | + |
| 88 | +通过以下步骤配置Grafana: |
| 89 | + |
| 90 | +- 登陆[localhost:3000](http:localhost:3000)默认用户名/密码为`admin:admin` |
| 91 | +- 倒入默认[dashboard](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/grafana-dashboards/Kepler-Exporter.json) |
| 92 | + |
| 93 | + |
| 94 | + |
| 95 | +### 卸载operator |
| 96 | +通过以下命令卸载: |
| 97 | +```sh |
| 98 | +make undeploy |
| 99 | +``` |
| 100 | + |
| 101 | +[参考这里](https://github.com/sustainable-computing-io/kepler-operator#getting-started) 来让kepler operator运行在kind集群上。 |
| 102 | + |
| 103 | +## 错误排查 |
| 104 | + |
| 105 | +### 监控所有的命名空间 |
| 106 | + |
| 107 | +kube-prometheus默认不会监控所有的命名空间,这是由于RBAC控制的。 |
| 108 | +以下clusterrole `prometheus-k8s`的配置讲允许kube-prometheus监控所有命名空间。 |
| 109 | + |
| 110 | +```sh |
| 111 | +oc describe clusterrole prometheus-k8s |
| 112 | +Name: prometheus-k8s |
| 113 | +Labels: app.kubernetes.io/component=prometheus |
| 114 | + app.kubernetes.io/instance=k8s |
| 115 | + app.kubernetes.io/name=prometheus |
| 116 | + app.kubernetes.io/part-of=kube-prometheus |
| 117 | + app.kubernetes.io/version=2.45.0 |
| 118 | +Annotations: <none> |
| 119 | +PolicyRule: |
| 120 | + Resources Non-Resource URLs Resource Names Verbs |
| 121 | + --------- ----------------- -------------- ----- |
| 122 | + endpoints [] [] [get list watch] |
| 123 | + pods [] [] [get list watch] |
| 124 | + services [] [] [get list watch] |
| 125 | + ingresses.networking.k8s.io [] [] [get list watch] |
| 126 | + [/metrics] [] [get] |
| 127 | + nodes/metrics [] [] [get] |
| 128 | + |
| 129 | +``` |
| 130 | + |
| 131 | +- 在创建[local cluster](#run-a-kind-cluster-locally)定制prometheus,请参考 |
| 132 | +kube-prometheus文档[Customizing Kube-Prometheus](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizing.md) |
| 133 | + |
| 134 | +- 请确定您应用了[this jsonnet](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizations/monitoring-all-namespaces.md)保证prometheus监控所有命名空间。 |
0 commit comments