This is the full operator and contributor guide for KubeLens AI.
| UI 1 | UI 2 | UI 3 |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
- Node.js 20+
- npm 10+
- Go 1.25+ (for backend tests/local backend)
- Python 3.12+ (for predictor tests)
- Docker Desktop (for image/compose workflows)
kubectl(for Kubernetes deploy/use)- Optional:
kustomize,helm
npm install
npm run devEndpoints:
- Frontend UI:
http://localhost:5173 - Backend API root:
http://localhost:3000/api - OpenAPI:
http://localhost:3000/api/openapi.yaml
Default behavior is safe demo mode: read-focused, no write operations.
Copy base config:
cp .env.example .envCore variables:
APP_MODE=demo # dev | demo | prod
DEV_MODE=false
KUBECONFIG_DATA=
KUBECONFIG_CONTEXTS=
AUTH_ENABLED=false
AUTH_TOKENS=
WRITE_ACTIONS_ENABLED=false
PREDICTOR_BASE_URL=
PREDICTOR_SHARED_SECRET=
ASSISTANT_PROVIDER=none
ASSISTANT_API_BASE_URL=
ASSISTANT_API_KEY=
ASSISTANT_MODEL=
ASSISTANT_RAG_ENABLED=true
Recommended profile defaults:
demo: UI demos, mock/safe behavior, no writes.dev: engineering workflow, optional auth, optional write testing.prod: authentication required, strict controls, read-only by default.
Single cluster via base64 kubeconfig:
PowerShell:
$bytes = [System.IO.File]::ReadAllBytes("$HOME\.kube\config")
$env:KUBECONFIG_DATA = [Convert]::ToBase64String($bytes)
npm run devLinux/macOS:
export KUBECONFIG_DATA=$(base64 -w 0 ~/.kube/config)
npm run devMulti-cluster:
KUBECONFIG_CONTEXTS=prod:<base64>,staging:<base64>,dev:<base64>
Verification:
kubectl cluster-info
kubectl get nodes
kubectl top nodes
kubectl top pods -AIf kubectl top fails, install/fix Metrics Server before expecting full metrics in the UI.
Use auth + explicit write gate together.
APP_MODE=dev
DEV_MODE=true
AUTH_ENABLED=true
AUTH_TOKENS=viewer:viewer:viewer-token,operator:operator:operator-token,admin:admin:admin-token
WRITE_ACTIONS_ENABLED=true
Protected write examples:
- Pod restart/delete/create
- Deployment scale/restart/rollback/apply
- Node cordon/uncordon/drain
- Remediation proposal execution
If either auth or write gate is not enabled, writes are blocked.
Suggested operator path:
- Start on Dashboard for cluster health and risk overview.
- Open Diagnostics for deterministic issues/evidence/recommendations.
- Use Nodes, Pods, and Deployments to inspect and act.
- Create/track Incidents and remediation approvals.
- Use Memory, Playbooks, and Postmortems to capture learning.
- Use Assistant for guided troubleshooting with deterministic context.
Major areas and expected actions:
- Dashboard: KPIs, utilization trends, restart hotspots.
- Diagnostics: investigate findings with evidence and severity.
- Predictions: see risk-scored incident candidates.
- Pods: view logs/events and perform controlled pod actions.
- Nodes: maintenance workflows including drain previews/execution.
- Deployments: scale/restart/rollback and inspect rollout health.
- Events + Audit: follow live and historical operational traces.
- Incidents + Remediation: triage, approve, execute, and close loops.
- Risk Guard: evaluate manifest risk before apply.
- Shift Brief: handoff-ready operations summary.
- Resource Catalog: deep inventory across apps/network/storage/RBAC.
Detailed feature reference: docs/FEATURES.md
npm run docker:build:predictor
npm run docker:run:predictorPREDICTOR_BASE_URL=http://localhost:8001
PREDICTOR_SHARED_SECRET=your-shared-secret
If predictor is unavailable, backend falls back to deterministic local prediction logic.
OpenAI-compatible provider example:
ASSISTANT_PROVIDER=openai_compatible
ASSISTANT_API_BASE_URL=https://api.openai.com/v1
ASSISTANT_API_KEY=...
ASSISTANT_MODEL=gpt-4o
ASSISTANT_RAG_ENABLED=true
Local Ollama example:
ollama pull llama3.2
ollama pull nomic-embed-textASSISTANT_PROVIDER=openai_compatible
ASSISTANT_API_BASE_URL=http://localhost:11434/v1
ASSISTANT_API_KEY=ollama
ASSISTANT_MODEL=llama3.2
ASSISTANT_EMBEDDING_MODEL=nomic-embed-text
ASSISTANT_EMBEDDING_BASE_URL=http://localhost:11434/v1
ALERTMANAGER_WEBHOOK_URL=
SLACK_WEBHOOK_URL=
PAGERDUTY_ROUTING_KEY=
CHATOPS_SLACK_WEBHOOK_URL=
CHATOPS_NOTIFY_INCIDENTS=true
CHATOPS_NOTIFY_REMEDIATIONS=true
CHATOPS_NOTIFY_POSTMORTEMS=true
CHATOPS_NOTIFY_ASSISTANT_FINDINGS=false
npm run lint
npm run test:web
npm run test:go
npm run test:predictor
npm run test:e2e
npm run buildBackend CI parity command:
npm run ci:backendnpm run docker:up
npm run docker:downUse this when you want a local packaged runtime quickly.
Available overlays:
k8s/overlays/devk8s/overlays/demok8s/overlays/prodk8s/overlays/tracingk8s/overlays/observability
Deploy:
kubectl apply -k k8s/overlays/dev
kubectl apply -k k8s/overlays/demo
kubectl apply -k k8s/overlays/prod
kubectl apply -k k8s/overlays/tracing
kubectl apply -k k8s/overlays/observabilityProduction secret flow:
cp k8s/secret.example.yaml k8s/secret.yaml
# fill values (especially AUTH_TOKENS)
kubectl apply -f k8s/secret.yaml
kubectl apply -k k8s/overlays/prodPost-deploy verification:
kubectl -n kubernetes-operations-dashboard get pods
kubectl -n kubernetes-operations-dashboard get svc
kubectl -n kubernetes-operations-dashboard logs deploy/k8s-ops-dashboard --tail=100Port-forward examples:
# app
kubectl -n kubernetes-operations-dashboard port-forward svc/k8s-ops-dashboard 3000:3000
# jaeger (if tracing overlay)
kubectl -n kubernetes-operations-dashboard port-forward svc/k8s-ops-jaeger 16686:16686
# grafana (if observability overlay)
kubectl -n kubernetes-operations-dashboard port-forward svc/k8s-ops-grafana 3001:3000Update and rollback:
# apply new manifest version
kubectl apply -k k8s/overlays/prod
# rollback deployment
kubectl -n kubernetes-operations-dashboard rollout undo deploy/k8s-ops-dashboard
kubectl -n kubernetes-operations-dashboard rollout status deploy/k8s-ops-dashboardUninstall:
kubectl delete -k k8s/overlays/prodMore deployment details: k8s/README.md
Install:
helm install kubelens ./helm/kubelensUpgrade:
helm upgrade kubelens ./helm/kubelensRollback and uninstall:
helm rollback kubelens <REVISION>
helm uninstall kubelensDaily contributor flow:
- Sync main branch.
- Create feature/fix branch.
- Make focused changes and update docs.
- Run local quality gates.
- Commit with clear scope.
- Open PR and pass CI.
- Merge after review.
Commands:
git checkout main
git pull origin main
git checkout -b feat/<short-topic>
git add .
git commit -m "feat: short summary"
git push -u origin feat/<short-topic>PR checklist:
- Behavior validated locally.
README.mdand/or docs updated if behavior/config changed.docs/FEATURES.mdupdated for user-facing changes.- Tests added/updated for risky behavior.
- Changelog updated when required by release policy.
Current CI workflow (.github/workflows/ci.yml) runs:
- Release discipline checks (
verify:release,verify:changelog,verify:openapi) - Frontend lint/test/build
- Backend CI + focused package coverage
- Predictor lint/tests
- E2E suite (Playwright Chromium + Firefox)
- Kustomize + kubeconform manifest validation
- Security checks (Trivy, Hadolint)
- Docker image build + smoke tests
Release + CD workflow (.github/workflows/release-supply-chain.yml) runs:
- On release tags (
v*): build/push signed dashboard + predictor images, generate SBOM attestations, then deploy todevandstagingvia Helm. - On manual dispatch: deploy an existing tag to a selected environment (
dev,staging, orprod). - Production deployments are controlled through GitHub Environment protections/approvals.
Required GitHub Environment secret per deploy target:
KUBE_CONFIG_B64(base64 kubeconfig for that environment)
Default Helm targets in auto-CD:
dev: namespacekubernetes-operations-dashboard-dev, releasekubelensstaging: namespacekubernetes-operations-dashboard-staging, releasekubelens
Release hygiene:
- Keep
package.jsonversion, Docker tags, and manifests aligned. - Keep
CHANGELOG.mdupdated. - Do not bypass failing CI on protected branches.
- Liveness:
GET /api/healthz - Readiness:
GET /api/readyz - Runtime status:
GET /api/runtime - OpenAPI contract:
GET /api/openapi.yaml - JSON telemetry:
GET /api/metrics - Prometheus telemetry:
GET /api/metrics/prometheus - Streams:
GET /api/stream(SSE),GET /api/stream/ws(WebSocket)
Full API details: docs/api.md
403on writes: role orWRITE_ACTIONS_ENABLEDis blocking the action.401predictor calls:PREDICTOR_SHARED_SECRETmismatch between backend and predictor.- Startup fails in
prod: auth config is incomplete. - Metrics show
N/A: Metrics Server missing or unhealthy. - Assistant blank/error responses: check provider, key, model, and base URL.
- Kubernetes deploy issues: render manifests locally first with
kubectl kustomize.














