Troubleshooting Guide

This guide helps you diagnose and resolve common issues with the KAgent Hook Controller.

Quick Diagnostics

Check Controller Status

# Check if controller is running
kubectl get pods -n kagent -l app=khook

# Check controller logs
kubectl logs -n kagent deployment/khook --tail=100

# Check hook resources
kubectl get hooks -A

Verify Configuration

# Check Kagent API credentials
kubectl get secret kagent-credentials -o yaml

# Verify CRD installation
kubectl get crd hooks.kagent.dev

# Check RBAC permissions
kubectl auth can-i get events --as=system:serviceaccount:kagent:khook

Common Issues

1. Hook Not Processing Events

Symptoms:

Hook is created successfully
Events are occurring in the cluster
No Kagent API calls are being made
Hook status shows no active events

Diagnostic Steps:

# Check if events are being generated
kubectl get events --field-selector involvedObject.kind=Pod --sort-by='.lastTimestamp'

# Verify hook configuration
kubectl describe hook your-hook-name

# Check controller logs for event processing
kubectl logs -n kagent deployment/khook | grep "event-processing"

Common Causes & Solutions:

Controller not watching the namespace:

# Check if controller has RBAC permissions for the namespace
kubectl auth can-i get events --namespace=your-namespace --as=system:serviceaccount:kagent:khook

Event type mismatch:
- Verify that the eventType in your hook matches actual Kubernetes events
- Check event reasons: kubectl get events --field-selector reason=Killing,reason=Failed
Hook in wrong namespace:
- Ensure hook is in the same namespace as the pods you want to monitor
- Or use cluster-wide monitoring if configured

2. Kagent API Connection Failures

Symptoms:

Events are being detected
Controller logs show API connection errors
Hook status shows failed API calls

Diagnostic Steps:

# Check API credentials
kubectl get secret kagent-credentials -o jsonpath='{.data.api-key}' | base64 -d

# Test API connectivity
kubectl exec -n kagent deployment/khook -- \
  curl -v -H "Authorization: Bearer $KAGENT_API_KEY" $KAGENT_BASE_URL/health

# Check controller logs for API errors
kubectl logs -n kagent deployment/khook | grep "kagent-api"
```**Common 
Causes & Solutions:**

1. **Invalid API credentials:**
   ```bash
   # Update credentials
   kubectl create secret generic kagent-credentials \
     --from-literal=api-key=your-correct-key \
     --from-literal=base-url=https://correct-url.com \
     --dry-run=client -o yaml | kubectl apply -f -
   
   # Restart controller to pick up new credentials
   kubectl rollout restart deployment/khook -n kagent

Network connectivity issues:

# Check DNS resolution
kubectl exec -n kagent deployment/khook -- nslookup api.kagent.dev

# Check firewall/network policies
kubectl get networkpolicies -A

API endpoint unreachable:
- Verify the KAGENT_BASE_URL is correct
- Check if the Kagent service is running
- Validate SSL certificates if using HTTPS

3. Events Not Being Deduplicated

Symptoms:

Same event triggers multiple Kagent calls within 10 minutes
Hook status shows duplicate active events
Excessive API calls in logs

Diagnostic Steps:

# Check active events in hook status
kubectl get hook your-hook-name -o jsonpath='{.status.activeEvents}' | jq .

# Check controller restart count
kubectl get pods -n kagent -l app=khook

# Verify leader election is working
kubectl logs -n kagent deployment/khook | grep "leader"

Common Causes & Solutions:

Controller restarts causing memory loss:

# Check for frequent restarts
kubectl describe pod -n kagent -l app=khook

# Increase memory limits if needed
kubectl patch deployment khook -n kagent -p '{"spec":{"template":{"spec":{"containers":[{"name":"manager","resources":{"limits":{"memory":"512Mi"}}}]}}}}'

Multiple controller instances without leader election:

# Ensure only one controller is leader
kubectl logs -n kagent deployment/khook | grep "successfully acquired lease"

# Check replica count
kubectl get deployment khook -n kagent

Clock skew issues:

# Check system time on controller
kubectl exec -n kagent deployment/khook -- date

# Compare with cluster time
kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type=="Ready")].lastTransitionTime}'

4. High Memory Usage

Symptoms:

Controller pod consuming excessive memory
OOMKilled events for controller pod
Slow event processing

Diagnostic Steps:

# Monitor memory usage
kubectl top pod -n kagent -l app=khook

# Check active events across all hooks
kubectl get hooks -A -o jsonpath='{range .items[*]}{.metadata.name}: {.status.activeEvents}{"\n"}{end}'

# Check for memory leaks in logs
kubectl logs -n kagent deployment/khook | grep -i "memory\|leak\|gc"

Solutions:

Increase resource limits:

kubectl patch deployment khook -n kagent -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "manager",
          "resources": {
            "limits": {"memory": "1Gi", "cpu": "500m"},
            "requests": {"memory": "256Mi", "cpu": "100m"}
          }
        }]
      }
    }
  }
}'

Clean up stale events:

# Restart controller to clean up memory
kubectl rollout restart deployment/khook -n kagent

5. Permission Denied Errors

Symptoms:

Controller logs show RBAC permission errors
Cannot watch events or update hook status
"Forbidden" errors in logs

Diagnostic Steps:

# Check current permissions
kubectl auth can-i get events --as=system:serviceaccount:kagent:khook
kubectl auth can-i update hooks --as=system:serviceaccount:kagent:khook

# Verify ClusterRole and ClusterRoleBinding
kubectl get clusterrole khook -o yaml
kubectl get clusterrolebinding khook -o yaml

Solutions:

Apply correct RBAC:
```
kubectl apply -f config/rbac/
```

Verify service account:

kubectl get serviceaccount khook -n kagent

Debug Mode

Enable debug logging for detailed troubleshooting:

# Enable debug logging
kubectl set env deployment/kagent-hook-controller -n kagent LOG_LEVEL=debug

# Watch debug logs
kubectl logs -n kagent deployment/khook -f | grep DEBUG

Performance Issues

Slow Event Processing

Symptoms:

Long delays between event occurrence and Kagent API calls
High CPU usage on controller

Solutions:

Increase controller resources:

kubectl patch deployment khook -n kagent -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "manager",
          "resources": {
            "limits": {"cpu": "1000m"},
            "requests": {"cpu": "200m"}
          }
        }]
      }
    }
  }
}'

Optimize hook configurations:
- Reduce number of event types per hook
- Use more specific event filtering
- Minimize prompt template complexity

High API Call Volume

Symptoms:

Kagent API rate limiting
High network usage
API timeout errors

Solutions:

Implement backoff strategies:
- Controller automatically implements exponential backoff
- Check logs for retry attempts
Optimize hook configurations:
- Consolidate similar hooks
- Use appropriate deduplication timeouts
- Review event type selections

Getting Help

Log Collection

Collect comprehensive logs for support:

# Controller logs
kubectl logs -n kagent deployment/khook --previous > controller-logs.txt

# Hook status
kubectl get hooks -A -o yaml > hooks-status.yaml

# Events
kubectl get events -A --sort-by='.lastTimestamp' > cluster-events.txt

# System info
kubectl version > cluster-info.txt
kubectl get nodes -o wide >> cluster-info.txt

Support Channels

GitHub Issues: kagent-hook-controller/issues
Community Forum: community.kagent.dev
Documentation: docs.kagent.dev

Before Reporting Issues

Please include:

Controller version and Kubernetes version
Hook configuration (sanitized)
Controller logs (last 100 lines)
Steps to reproduce the issue
Expected vs actual behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Guide

Quick Diagnostics

Check Controller Status

Verify Configuration

Common Issues

1. Hook Not Processing Events

2. Kagent API Connection Failures

3. Events Not Being Deduplicated

4. High Memory Usage

5. Permission Denied Errors

Debug Mode

Performance Issues

Slow Event Processing

High API Call Volume

Getting Help

Log Collection

Support Channels

Before Reporting Issues

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

Quick Diagnostics

Check Controller Status

Verify Configuration

Common Issues

1. Hook Not Processing Events

2. Kagent API Connection Failures

3. Events Not Being Deduplicated

4. High Memory Usage

5. Permission Denied Errors

Debug Mode

Performance Issues

Slow Event Processing

High API Call Volume

Getting Help

Log Collection

Support Channels

Before Reporting Issues