diff --git a/docs/installation/community-operator.zh.md b/docs/installation/community-operator.zh.md new file mode 100644 index 00000000..74dd3b20 --- /dev/null +++ b/docs/installation/community-operator.zh.md @@ -0,0 +1,55 @@ +# Kepler 社区 Operator on OpenShift + +## 需求 + +请确定您拥有: + +- 一个OCP 4.13集群 +- 有`kubeadmin` 或者 `cluster-admin` 权限的用户。 +- `oc` 命令. +- 下载了`kepler-operator`[repository](https://github.com/sustainable-computing-io/kepler-operator). +```sh +git clone https://github.com/sustainable-computing-io/kepler-operator.git +cd kepler-exporter +``` +--- +## 从Operator Hub安装operator + +1. 选中Operators > OperatorHub. 搜索 `Kepler`. 点击 `Install` +![](../fig/ocp_installation/operator_installation_ocp_1.png) + +2. 允许安装 +![](../fig/ocp_installation/operator_installation_ocp_7.png) + +3. 创建Kepler的Custom Resource +![](../fig/ocp_installation/operator_installation_ocp_2.png) +> 注意:当前的OCP控制台可能会显示一个JavaScript错误(预计将在4.13.5中修复),但它不会影响其余步骤。修复程序目前可在4.13.0-0.nightly-2023-07-08-165124版本的OCP控制台上获得。 + +--- +## 安装Grafana operator + +### 部署Grafana Operator + +当前API Bearer令牌需要在`GrafanaDataSource`清单中更新,以便`Grafana DataSource`可以向Prometheus进行身份验证。以下命令将更新清单并在命名空间`kepler-operator-system`中部署Grafana Operator + +```sh +BEARER_TOKEN=$(oc whoami --show-token) +hack/dashboard/openshift/deploy-grafana.sh +``` +> 注意:脚本要求您位于顶级目录中,因此请确保您位于`kepler-operator`根目录中。使用命令`cd $(git rev-parse --show-toplevel)` + +### 访问Garafana Console +配置Networking > Routes. +![](../fig/ocp_installation/operator_installation_ocp_5.png) + +### Grafana Dashboard +使用密钥`kepler:kepler`登陆Grafana Dashboard. +![](../fig/ocp_installation/operator_installation_ocp_6.png) + +--- + +## 故障排除 + +> 注意:如果数据源出现问题,请检查API令牌是否已正确更新 + +![](../fig/ocp_installation/operator_installation_ocp_3.png) diff --git a/docs/installation/kepler-helm.zh.md b/docs/installation/kepler-helm.zh.md new file mode 100644 index 00000000..49530530 --- /dev/null +++ b/docs/installation/kepler-helm.zh.md @@ -0,0 +1,62 @@ +# 通过Helm Chart部署kepler + +Kepler的Helm Chart目前在[GitHub](https://github.com/sustainable-computing-io/kepler-helm-chart/tree/main)和[ArtifactHub](https://artifacthub.io/packages/helm/kepler/kepler)上可用了。 + +## 安装Helm +作为准备工作您必须先安装[Helm](https://helm.sh)才可以使用Helm Chart来安装kepler。 +您可以参考Helm的[文档](https://helm.sh/docs/)来进行安装。 + + +## 添加Kepler Helm仓库 + +执行命令: + +```bash +helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart +``` + +您可以通过以下命令找到最新版本 + +```bash +helm search repo kepler +``` + +您可以执行以下命令来测试并检查生成的用于安装的配置文件。 + +```bash +helm install kepler kepler/kepler --namespace kepler --create-namespace --dry-run --devel +``` + +## 安装Kepler + +执行命令: + +```bash +helm install kepler kepler/kepler --namespace kepler --create-namespace +``` + +>您也许需要改变环境变量来适配您的实际情况[values.yaml](https://github.com/sustainable-computing-io/kepler-helm-chart/blob/main/chart/kepler/values.yaml). + +并通过以下命令来使得改动生效 + +```bash +helm install kepler kepler/kepler --values values.yaml --namespace kepler --create-namespace +``` + +下表列出了配置参数的定义和默认值。 + +Parameter|Description| Default +---|---|--- +global.namespace| Kubernetes namespace for kepler |kepler +image.repository|Repository for Kepler Image| quay.io/sustainable\_computing\_io/kepler +image.pullPolicy|Pull policy for Kepler|Always +image.tag|Image tag for Kepler Image |latest +serviceAccount.name|Service account name for Kepler|kepler-sa +service.type|Kepler service type|ClusterIP +service.port|Kepler service exposed port|9102 + +## 卸载 Kepler +您可以通过以下命令卸载 +```bash +helm delete --purge kepler --tiller-namespace +``` diff --git a/docs/installation/kepler-operator.zh.md b/docs/installation/kepler-operator.zh.md new file mode 100644 index 00000000..39ce1de5 --- /dev/null +++ b/docs/installation/kepler-operator.zh.md @@ -0,0 +1,134 @@ +# 通过Kepler Operator在Kind上安装 + +## 需求: + +在开始前请确认您已经安装了: + +- `kubectl` +- 下载了`kepler-operator`[repository](https://github.com/sustainable-computing-io/kepler-operator) +- 目标k8s集群。您可以使用Kind来简单构建一个本地k8s集群来体验本教程。[local cluster for testing](#run-a-kind-cluster-locally), 或直接在您远端的k8s集群执行。注意您的controller将会自动使用当前的kubeconfig配置文件。您可以通过`kubectl cluster-info`来查看。 +- 有`kubeadmin` 或者 `cluster-admin` 权限的用户。 + +### 启动一个本地kind集群 + +``` sh +cd kepler-operator +make cluster-up CLUSTER_PROVIDER='kind' CI_DEPLOY=true GRAFANA_ENABLE=true + +kubectl get pods -n monitoring + +grafana-b88df6989-km7c6 1/1 Running 0 48m +prometheus-k8s-0 2/2 Running 0 46m +prometheus-operator-6bd88c8bdf-9f69h 2/2 Running 0 48m +``` + +## 启动kepler-operator +- 您可以通过quay.io上的image来部署kepler-operator. + +```sh +make deploy IMG=quay.io/sustainable_computing_io/kepler-operator:latest +kubectl config set-context --current --namespace=monitoring +kubectl apply -k config/samples/ +``` +- 通过`kubectl get pods -n monitoring`命令来验证`kepler-exporter`pod的部署情况。 + + +## 设置Grafana Dashboard + +使用`GRAFANA_ENABLE=true` 来配置`kube-prometheus`在命名空间`monitoring`上的部署. +通过以下命令来访问位于3000端口的grafana界面。 + +```sh +kubectl port-forward svc/grafana 3000:3000 -n monitoring +``` + +>并通过以下域名访问[http://localhost:3000](http://localhost:3000) + +### Service Monitor + +让`kube-prometheus` 使用 `kepler-exporter` 服务端口进行监控,您需要配置service monitor. + +> Note: 默认情况下`kube-prometheus` 不会捕捉`monitoring`命名空间之外的服务. 如果您的kepler部署在`monitoring`空间之外[请看考以下步骤](#scrape-all-namespaces). + +``` +kubectl apply -n monitoring -f - << +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + labels: + app.kubernetes.io/component: exporter + app.kubernetes.io/name: kepler-exporter + sustainable-computing.io/app: kepler + name: monitor-kepler-exporter +spec: + endpoints: + - interval: 3s + port: http + relabelings: + - action: replace + regex: (.*) + replacement: $1 + sourceLabels: + - __meta_kubernetes_pod_node_name + targetLabel: instance + scheme: http + jobLabel: app.kubernetes.io/name + namespaceSelector: + matchNames: + any: true + selector: + matchLabels: + app.kubernetes.io/component: exporter + app.kubernetes.io/name: kepler-exporter +EOF +``` + +### Grafana Dashboard + +通过以下步骤配置Grafana: + +- 登陆[localhost:3000](http:localhost:3000)默认用户名/密码为`admin:admin` +- 倒入默认[dashboard](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/grafana-dashboards/Kepler-Exporter.json) + +![](../fig/ocp_installation/kind_grafana.png) + +### 卸载operator +通过以下命令卸载: +```sh +make undeploy +``` + +[参考这里](https://github.com/sustainable-computing-io/kepler-operator#getting-started) 来让kepler operator运行在kind集群上。 + +## 错误排查 + +### 监控所有的命名空间 + +kube-prometheus默认不会监控所有的命名空间,这是由于RBAC控制的。 +以下clusterrole `prometheus-k8s`的配置讲允许kube-prometheus监控所有命名空间。 + +```sh +oc describe clusterrole prometheus-k8s +Name: prometheus-k8s +Labels: app.kubernetes.io/component=prometheus + app.kubernetes.io/instance=k8s + app.kubernetes.io/name=prometheus + app.kubernetes.io/part-of=kube-prometheus + app.kubernetes.io/version=2.45.0 +Annotations: +PolicyRule: + Resources Non-Resource URLs Resource Names Verbs + --------- ----------------- -------------- ----- + endpoints [] [] [get list watch] + pods [] [] [get list watch] + services [] [] [get list watch] + ingresses.networking.k8s.io [] [] [get list watch] + [/metrics] [] [get] + nodes/metrics [] [] [get] + +``` + +- 在创建[local cluster](#run-a-kind-cluster-locally)定制prometheus,请参考 +kube-prometheus文档[Customizing Kube-Prometheus](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizing.md) + +- 请确定您应用了[this jsonnet](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizations/monitoring-all-namespaces.md)保证prometheus监控所有命名空间。