This guide helps you diagnose and resolve common issues with the KAgent Hook Controller.
# Check if controller is running
kubectl get pods -n kagent -l app=khook
# Check controller logs
kubectl logs -n kagent deployment/khook --tail=100
# Check hook resources
kubectl get hooks -A# Check Kagent API credentials
kubectl get secret kagent-credentials -o yaml
# Verify CRD installation
kubectl get crd hooks.kagent.dev
# Check RBAC permissions
kubectl auth can-i get events --as=system:serviceaccount:kagent:khookSymptoms:
- Hook is created successfully
- Events are occurring in the cluster
- No Kagent API calls are being made
- Hook status shows no active events
Diagnostic Steps:
# Check if events are being generated
kubectl get events --field-selector involvedObject.kind=Pod --sort-by='.lastTimestamp'
# Verify hook configuration
kubectl describe hook your-hook-name
# Check controller logs for event processing
kubectl logs -n kagent deployment/khook | grep "event-processing"Common Causes & Solutions:
-
Controller not watching the namespace:
# Check if controller has RBAC permissions for the namespace kubectl auth can-i get events --namespace=your-namespace --as=system:serviceaccount:kagent:khook -
Event type mismatch:
- Verify that the
eventTypein your hook matches actual Kubernetes events - Check event reasons:
kubectl get events --field-selector reason=Killing,reason=Failed
- Verify that the
-
Hook in wrong namespace:
- Ensure hook is in the same namespace as the pods you want to monitor
- Or use cluster-wide monitoring if configured
Symptoms:
- Events are being detected
- Controller logs show API connection errors
- Hook status shows failed API calls
Diagnostic Steps:
# Check API credentials
kubectl get secret kagent-credentials -o jsonpath='{.data.api-key}' | base64 -d
# Test API connectivity
kubectl exec -n kagent deployment/khook -- \
curl -v -H "Authorization: Bearer $KAGENT_API_KEY" $KAGENT_BASE_URL/health
# Check controller logs for API errors
kubectl logs -n kagent deployment/khook | grep "kagent-api"
```**Common
Causes & Solutions:**
1. **Invalid API credentials:**
```bash
# Update credentials
kubectl create secret generic kagent-credentials \
--from-literal=api-key=your-correct-key \
--from-literal=base-url=https://correct-url.com \
--dry-run=client -o yaml | kubectl apply -f -
# Restart controller to pick up new credentials
kubectl rollout restart deployment/khook -n kagent-
Network connectivity issues:
# Check DNS resolution kubectl exec -n kagent deployment/khook -- nslookup api.kagent.dev # Check firewall/network policies kubectl get networkpolicies -A
-
API endpoint unreachable:
- Verify the
KAGENT_BASE_URLis correct - Check if the Kagent service is running
- Validate SSL certificates if using HTTPS
- Verify the
Symptoms:
- Same event triggers multiple Kagent calls within 10 minutes
- Hook status shows duplicate active events
- Excessive API calls in logs
Diagnostic Steps:
# Check active events in hook status
kubectl get hook your-hook-name -o jsonpath='{.status.activeEvents}' | jq .
# Check controller restart count
kubectl get pods -n kagent -l app=khook
# Verify leader election is working
kubectl logs -n kagent deployment/khook | grep "leader"Common Causes & Solutions:
-
Controller restarts causing memory loss:
# Check for frequent restarts kubectl describe pod -n kagent -l app=khook # Increase memory limits if needed kubectl patch deployment khook -n kagent -p '{"spec":{"template":{"spec":{"containers":[{"name":"manager","resources":{"limits":{"memory":"512Mi"}}}]}}}}'
-
Multiple controller instances without leader election:
# Ensure only one controller is leader kubectl logs -n kagent deployment/khook | grep "successfully acquired lease" # Check replica count kubectl get deployment khook -n kagent
-
Clock skew issues:
# Check system time on controller kubectl exec -n kagent deployment/khook -- date # Compare with cluster time kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type=="Ready")].lastTransitionTime}'
Symptoms:
- Controller pod consuming excessive memory
- OOMKilled events for controller pod
- Slow event processing
Diagnostic Steps:
# Monitor memory usage
kubectl top pod -n kagent -l app=khook
# Check active events across all hooks
kubectl get hooks -A -o jsonpath='{range .items[*]}{.metadata.name}: {.status.activeEvents}{"\n"}{end}'
# Check for memory leaks in logs
kubectl logs -n kagent deployment/khook | grep -i "memory\|leak\|gc"Solutions:
-
Increase resource limits:
kubectl patch deployment khook -n kagent -p '{ "spec": { "template": { "spec": { "containers": [{ "name": "manager", "resources": { "limits": {"memory": "1Gi", "cpu": "500m"}, "requests": {"memory": "256Mi", "cpu": "100m"} } }] } } } }'
-
Clean up stale events:
# Restart controller to clean up memory kubectl rollout restart deployment/khook -n kagent
Symptoms:
- Controller logs show RBAC permission errors
- Cannot watch events or update hook status
- "Forbidden" errors in logs
Diagnostic Steps:
# Check current permissions
kubectl auth can-i get events --as=system:serviceaccount:kagent:khook
kubectl auth can-i update hooks --as=system:serviceaccount:kagent:khook
# Verify ClusterRole and ClusterRoleBinding
kubectl get clusterrole khook -o yaml
kubectl get clusterrolebinding khook -o yamlSolutions:
-
Apply correct RBAC:
kubectl apply -f config/rbac/
-
Verify service account:
kubectl get serviceaccount khook -n kagent
Enable debug logging for detailed troubleshooting:
# Enable debug logging
kubectl set env deployment/kagent-hook-controller -n kagent LOG_LEVEL=debug
# Watch debug logs
kubectl logs -n kagent deployment/khook -f | grep DEBUGSymptoms:
- Long delays between event occurrence and Kagent API calls
- High CPU usage on controller
Solutions:
-
Increase controller resources:
kubectl patch deployment khook -n kagent -p '{ "spec": { "template": { "spec": { "containers": [{ "name": "manager", "resources": { "limits": {"cpu": "1000m"}, "requests": {"cpu": "200m"} } }] } } } }'
-
Optimize hook configurations:
- Reduce number of event types per hook
- Use more specific event filtering
- Minimize prompt template complexity
Symptoms:
- Kagent API rate limiting
- High network usage
- API timeout errors
Solutions:
-
Implement backoff strategies:
- Controller automatically implements exponential backoff
- Check logs for retry attempts
-
Optimize hook configurations:
- Consolidate similar hooks
- Use appropriate deduplication timeouts
- Review event type selections
Collect comprehensive logs for support:
# Controller logs
kubectl logs -n kagent deployment/khook --previous > controller-logs.txt
# Hook status
kubectl get hooks -A -o yaml > hooks-status.yaml
# Events
kubectl get events -A --sort-by='.lastTimestamp' > cluster-events.txt
# System info
kubectl version > cluster-info.txt
kubectl get nodes -o wide >> cluster-info.txt- GitHub Issues: kagent-hook-controller/issues
- Community Forum: community.kagent.dev
- Documentation: docs.kagent.dev
Please include:
- Controller version and Kubernetes version
- Hook configuration (sanitized)
- Controller logs (last 100 lines)
- Steps to reproduce the issue
- Expected vs actual behavior