Skip to content

diablinux/pod-limit-checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pod Limit Checker

Pod Limit Checker is a Go-based command-line tool designed for Kubernetes administrators to identify and fix resource limit configuration issues in their clusters. The tool scans all pods across namespaces, detects missing CPU and memory limits, analyzes current resource usage patterns, and provides intelligent, usage-based recommendations for optimal limit configuration.

Table of Contents

  1. Problem Statement & Motivation

  2. Architecture & Design Decisions

  3. Code Structure

  4. Key Features & Implementation Details

  5. Usage Examples

  6. Installation & Setup

  7. Future Enhancements

  8. Industry Relevance


Problem Statement & Motivation

The Challenge

In Kubernetes production environments, one of the most common misconfigurations is the absence of resource limits on pods. This leads to several critical issues:

  1. Resource Starvation: A single pod without limits can consume all available node resources, causing other pods to be evicted or fail

  2. Unpredictable Performance: Without limits, pods can experience variable performance depending on what else is running on the node

  3. Cost Inefficiency: In cloud environments, unconstrained resource usage leads to unnecessary costs

  4. Security Risks: Resource exhaustion attacks become easier when limits aren't enforced

Why This Project Was Created

As a Kubernetes cluster administrator, I frequently encountered:

  • Production incidents caused by runaway resource consumption

  • Difficulty identifying which pods lacked limits in large clusters

  • The need for data-driven recommendations rather than guesswork

  • Lack of tools that combined limit detection with usage-based analysis

Industry Context

According to the Kubernetes best practices and the 12-factor app methodology, applications should declare their resource requirements. Major cloud providers (AWS, GCP, Azure) and Kubernetes security frameworks like CIS Kubernetes Benchmarks specifically require resource limits as a security control.


Architecture & Design Decisions

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   K8s API       β”‚    β”‚   Metrics API   β”‚    β”‚   Configuration β”‚
β”‚   - Pods        │◄────   - Usage       │◄────   - kubeconfig  β”‚
β”‚   - Namespaces  β”‚    β”‚   - Metrics     β”‚    β”‚   - Flags       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚      Pod Limit Checker      β”‚
                  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                  β”‚  β”‚       Analyzer         β”‚ β”‚
                  β”‚  β”‚  - Resource Analysis   β”‚ β”‚
                  β”‚  β”‚  - Risk Assessment     β”‚ β”‚
                  β”‚  β”‚  - Recommendations     β”‚ β”‚
                  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                  β”‚  β”‚       Reporter         β”‚ β”‚
                  β”‚  β”‚  - Table Output        β”‚ β”‚
                  β”‚  β”‚  - JSON/YAML Export    β”‚ β”‚
                  β”‚  β”‚  - Verbose Details     β”‚ β”‚
                  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚        Output Formats       β”‚
                  β”‚  β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”  β”‚
                  β”‚  β”‚Tableβ”‚  β”‚JSON β”‚  β”‚YAML β”‚  β”‚
                  β”‚  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Decisions

  1. Modular Architecture: Separated into analyzer, reporter, and Kubernetes client packages for testability and maintainability

  2. Idempotent Operations: The tool only reads data, never modifies cluster state

  3. Progressive Enhancement: Works with or without metrics server, providing appropriate suggestions for each scenario

  4. Human-Readable Output: Uses emojis and color-coded risk levels for quick visual assessment

  5. Programmable Interface: JSON/YAML output for integration with other tools or automation pipelines

Algorithm Design

The recommendation algorithm is based on the following formula:

Recommended Limit = Current Usage Γ— Multiplier + Minimum Buffer

Where:
- CPU Limit Multiplier: 2.5x
- CPU Request Multiplier: 1.2x
- Memory Limit Multiplier: 2.5x
- Memory Request Multiplier: 1.2x
- Minimum CPU Limit: 100m
- Minimum CPU Request: 50m
- Minimum Memory Limit: 128Mi
- Minimum Memory Request: 64Mi

This algorithm is derived from industry best practices:

  • 2.5x multiplier for limits: Provides buffer for traffic spikes while preventing resource exhaustion

  • 1.2x multiplier for requests: Ensures baseline performance without over-provisioning

  • Minimum values: Prevent unreasonably small limits that could cause pod eviction


Code Structure

pod-limit-checker/
β”œβ”€β”€ main.go                    # Entry point
β”œβ”€β”€ cmd/
β”‚   └── check.go              # Command-line interface and flag parsing
β”œβ”€β”€ pkg/
β”‚   β”œβ”€β”€ kubernetes/
β”‚   β”‚   └── client.go         # K8s API client initialization
β”‚   β”œβ”€β”€ analyzer/
β”‚   β”‚   └── analyzer.go       # Core analysis logic
β”‚   └── reporter/
β”‚       └── reporter.go       # Output formatting and reporting
β”œβ”€β”€ go.mod                    # Dependency management
└── README.md                 # User documentation

Key Components

  1. main.go - Application Entry Point
  • Minimal main function that delegates to command execution

  • Error handling and exit code management

  1. cmd/check.go - CLI Implementation
  • Flag parsing with comprehensive options

  • Context management with timeout for API calls

  • Error propagation with user-friendly messages

  • Kubeconfig resolution with fallback to default locations

  1. pkg/kubernetes/client.go - API Integration
  • Dual client setup: Core API + Metrics API

  • Config loading from kubeconfig or in-cluster configuration

  • Error handling for connectivity issues

  • Cross-platform support for Windows/Linux/macOS

  1. pkg/analyzer/analyzer.go - Business Logic
  • Pod analysis with risk level calculation

  • Resource calculation using Kubernetes resource.Quantity

  • Usage-based recommendations with intelligent algorithms

  • State management for different analysis scenarios

  1. pkg/reporter/reporter.go - Output Management
  • Multi-format output (table, JSON, YAML)

  • Progressive disclosure Lessons Learnedwith verbose mode

  • Visual indicators with emojis and formatting

  • Context-aware suggestions based on available data


Key Features & Implementation Details

1. Intelligent Limit Detection

// Detects if any resource limits are missing
func hasMissingLimits(container v1.Container) bool {
    _, hasCPU := container.Resources.Limits[v1.ResourceCPU]
    _, hasMemory := container.Resources.Limits[v1.ResourceMemory]
    return !hasCPU || !hasMemory
}

Why this matters: Some pods have partial limits (e.g., CPU but no memory), which is still a risk.

2. Risk Assessment Algorithm

func calculateRiskLevel(container v1.Container, usage *ResourceUsage) string {
    if len(container.Resources.Limits) == 0 {
        return "HIGH"  // No limits at all
    }
    
    // Check for partial limits
    _, hasCPU := container.Resources.Limits[v1.ResourceCPU]
    _, hasMemory := container.Resources.Limits[v1.ResourceMemory]
    if !hasCPU || !hasMemory {
        return "MEDIUM"  // Partial configuration
    }
    
    // Check usage against limits
    if usage != nil && isHighUtilization(container, usage) {
        return "MEDIUM"  // Limits may be too tight
    }
    
    return "LOW"  // Properly configured
}

Implementation Insight: Three-tier risk model allows for prioritization of fixes.

3. Usage-Based Recommendations

func generateRecommendations(usage *ResourceUsage) Recommendations {
    // Calculate based on actual usage patterns
    cpuLimit := max(usage.CPU.MilliValue() * 5 / 2, 100)
    cpuRequest := max(usage.CPU.MilliValue() * 6 / 5, 50)
    
    return Recommendations{
        CPULimit:    fmt.Sprintf("%dm", cpuLimit),
        CPURequest:  fmt.Sprintf("%dm", cpuRequest),
        MemoryLimit: calculateMemoryLimit(usage.Memory),
    }
}

Algorithm Choice: The 2.5x multiplier for limits is based on:

  • Google's SRE book recommendations for headroom

  • AWS Well-Architected Framework buffer recommendations

  • Empirical data from production workloads

4. Graceful Degradation

if podMetrics, err := analyzer.GetPodMetrics(ctx); err != nil {
    log.Printf("Metrics unavailable: %v", err)
    // Continue with basic analysis without usage data
    results = analyzer.AnalyzeWithoutMetrics(pods)
} else {
    // Full analysis with usage data
    results = analyzer.AnalyzeWithMetrics(pods, podMetrics)
}

Design Principle: The tool should provide value even when metrics server isn't available.


Usage Examples

Basic Usage

# Check all namespaces with default settings
./pod-limit-checker

# Check specific namespace
./pod-limit-checker --namespace production

# Output in JSON for automation
./pod-limit-checker --output json | jq '.[] | select(.RiskLevel == "HIGH")'

Advanced Scenarios

# Verbose output with all details
./pod-limit-checker --namespace staging --verbose

# Generate YAML patches for automation
./pod-limit-checker --namespace kubernetes-dashboard --output yaml --quiet |   yq eval '.[0].exampleyaml'

Real-World Workflow

# 1. Initial assessment
./pod-limit-checker > initial-report.txt

# 2. Focus on high-risk namespaces
./pod-limit-checker --namespace customer-facing --verbose

# 3. Generate specific recommendations
./pod-limit-checker --output json --quiet |   jq -r '.[] | "\(.PodName) \(.ContainerName)|CPU:\(.RecommendedCPULimit)|Memory:\(.RecommendedMemoryLimit)"' |   column -t -s '|'

# 4. Apply fixes (manual step)
# Use the provided YAML examples to update deployments

Installation & Setup

Prerequisites

  • Go 1.21+

  • Kubernetes cluster v1.34.2+

  • Metrics Server installed

  • kubeconfig with appropriate RBAC permissions

Installation Steps

# Clone and build
git clone https://github.com/diablinux/pod-limit-checker.git
cd pod-limit-checker
go build -o pod-limit-checker

# Or install globally
go install ./...

# Verify installation
pod-limit-checker --help

RBAC Configuration

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-limit-checker
rules:
- apiGroups: [""]
  resources: ["pods", "namespaces"]
  verbs: ["list", "get"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods"]
  verbs: ["list", "get"]

Container Deployment

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o pod-limit-checker

FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/pod-limit-checker /usr/local/bin/
ENTRYPOINT ["pod-limit-checker"]

Future Enhancements

Planned Features

  1. Auto-remediation Mode: Generate and apply patches automatically
./pod-limit-checker --auto-fix --dry-run
./pod-limit-checker --auto-fix --apply
  1. Historical Analysis: Track resource usage patterns over time
type HistoricalAnalysis struct {
    Trend          string  // "increasing", "decreasing", "stable"
    PeakUsage      float64
    AverageUsage   float64
    Recommendation string
}
  1. Cost Estimation: Calculate potential cost savings
./pod-limit-checker --cost-estimate --provider aws
# Output: Estimated monthly savings: $2,500
  1. Integration with CI/CD: Pre-deployment validation
# GitHub Actions workflow
- name: Validate Resource Limits
  uses: pod-limit-checker/action@v1
  with:
    fail-on-high-risk: true
  1. Multi-cluster Support: Analyze across multiple clusters
./pod-limit-checker --clusters prod,staging,dev

Industry Relevance

This project addresses several key industry trends:

  1. FinOps: Cloud cost optimization through proper resource management

  2. GitOps: Integration with deployment pipelines for policy enforcement

  3. Observability: Combining configuration analysis with runtime metrics

  4. Security: Implementing Kubernetes security best practices


References & Inspiration

  1. Kubernetes Documentation: Resource Management Best Practices

  2. Google SRE Book: Error budgets and resource planning

  3. AWS Well-Architected Framework: Reliability pillar recommendations

  4. CIS Kubernetes Benchmarks: Security controls for resource limits

  5. Open Source Projects: kube-bench, kube-score, popeye for inspiration


"In resource management, as in life, boundaries create freedom." - Anonymous Kubernetes Admin

About

Pod Limit Checker is a Go-based command-line tool designed for Kubernetes administrators to identify and fix resource limit configuration issues in their clusters.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors