Pod Limit Checker

Pod Limit Checker is a Go-based command-line tool designed for Kubernetes administrators to identify and fix resource limit configuration issues in their clusters. The tool scans all pods across namespaces, detects missing CPU and memory limits, analyzes current resource usage patterns, and provides intelligent, usage-based recommendations for optimal limit configuration.

Resource Starvation: A single pod without limits can consume all available node resources, causing other pods to be evicted or fail
Unpredictable Performance: Without limits, pods can experience variable performance depending on what else is running on the node
Cost Inefficiency: In cloud environments, unconstrained resource usage leads to unnecessary costs
Security Risks: Resource exhaustion attacks become easier when limits aren't enforced

Why This Project Was Created

As a Kubernetes cluster administrator, I frequently encountered:

Production incidents caused by runaway resource consumption
Difficulty identifying which pods lacked limits in large clusters
The need for data-driven recommendations rather than guesswork
Lack of tools that combined limit detection with usage-based analysis

Industry Context

According to the Kubernetes best practices and the 12-factor app methodology, applications should declare their resource requirements. Major cloud providers (AWS, GCP, Azure) and Kubernetes security frameworks like CIS Kubernetes Benchmarks specifically require resource limits as a security control.

Architecture & Design Decisions

High-Level Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   K8s API       │    │   Metrics API   │    │   Configuration │
│   - Pods        │◄───┤   - Usage       │◄───┤   - kubeconfig  │
│   - Namespaces  │    │   - Metrics     │    │   - Flags       │
└────────┬────────┘    └─────────┬───────┘    └────────┬────────┘
         │                       │                     │
         └───────────────────────┼─────────────────────┘
                                 │
                  ┌──────────────▼──────────────┐
                  │      Pod Limit Checker      │
                  │  ┌────────────────────────┐ │
                  │  │       Analyzer         │ │
                  │  │  - Resource Analysis   │ │
                  │  │  - Risk Assessment     │ │
                  │  │  - Recommendations     │ │
                  │  └────────────────────────┘ │
                  │  ┌────────────────────────┐ │
                  │  │       Reporter         │ │
                  │  │  - Table Output        │ │
                  │  │  - JSON/YAML Export    │ │
                  │  │  - Verbose Details     │ │
                  │  └────────────────────────┘ │
                  └──────────────┬──────────────┘
                                 │
                  ┌──────────────▼──────────────┐
                  │        Output Formats       │
                  │  ┌─────┐  ┌─────┐  ┌─────┐  │
                  │  │Table│  │JSON │  │YAML │  │
                  │  └─────┘  └─────┘  └─────┘  │
                  └─────────────────────────────┘

Design Decisions

Modular Architecture: Separated into analyzer, reporter, and Kubernetes client packages for testability and maintainability
Idempotent Operations: The tool only reads data, never modifies cluster state
Progressive Enhancement: Works with or without metrics server, providing appropriate suggestions for each scenario
Human-Readable Output: Uses emojis and color-coded risk levels for quick visual assessment
Programmable Interface: JSON/YAML output for integration with other tools or automation pipelines

Algorithm Design

The recommendation algorithm is based on the following formula:

Recommended Limit = Current Usage × Multiplier + Minimum Buffer

Where:
- CPU Limit Multiplier: 2.5x
- CPU Request Multiplier: 1.2x
- Memory Limit Multiplier: 2.5x
- Memory Request Multiplier: 1.2x
- Minimum CPU Limit: 100m
- Minimum CPU Request: 50m
- Minimum Memory Limit: 128Mi
- Minimum Memory Request: 64Mi

This algorithm is derived from industry best practices:

2.5x multiplier for limits: Provides buffer for traffic spikes while preventing resource exhaustion
1.2x multiplier for requests: Ensures baseline performance without over-provisioning
Minimum values: Prevent unreasonably small limits that could cause pod eviction

Code Structure

pod-limit-checker/
├── main.go                    # Entry point
├── cmd/
│   └── check.go              # Command-line interface and flag parsing
├── pkg/
│   ├── kubernetes/
│   │   └── client.go         # K8s API client initialization
│   ├── analyzer/
│   │   └── analyzer.go       # Core analysis logic
│   └── reporter/
│       └── reporter.go       # Output formatting and reporting
├── go.mod                    # Dependency management
└── README.md                 # User documentation

Key Components

main.go - Application Entry Point

Minimal main function that delegates to command execution
Error handling and exit code management

cmd/check.go - CLI Implementation

Flag parsing with comprehensive options
Context management with timeout for API calls
Error propagation with user-friendly messages
Kubeconfig resolution with fallback to default locations

pkg/kubernetes/client.go - API Integration

Dual client setup: Core API + Metrics API
Config loading from kubeconfig or in-cluster configuration
Error handling for connectivity issues
Cross-platform support for Windows/Linux/macOS

pkg/analyzer/analyzer.go - Business Logic

Pod analysis with risk level calculation
Resource calculation using Kubernetes resource.Quantity
Usage-based recommendations with intelligent algorithms
State management for different analysis scenarios

pkg/reporter/reporter.go - Output Management

Multi-format output (table, JSON, YAML)
Progressive disclosure Lessons Learnedwith verbose mode
Visual indicators with emojis and formatting
Context-aware suggestions based on available data

Key Features & Implementation Details

1. Intelligent Limit Detection

// Detects if any resource limits are missing
func hasMissingLimits(container v1.Container) bool {
    _, hasCPU := container.Resources.Limits[v1.ResourceCPU]
    _, hasMemory := container.Resources.Limits[v1.ResourceMemory]
    return !hasCPU || !hasMemory
}

Why this matters: Some pods have partial limits (e.g., CPU but no memory), which is still a risk.

2. Risk Assessment Algorithm

func calculateRiskLevel(container v1.Container, usage *ResourceUsage) string {
    if len(container.Resources.Limits) == 0 {
        return "HIGH"  // No limits at all
    }
    
    // Check for partial limits
    _, hasCPU := container.Resources.Limits[v1.ResourceCPU]
    _, hasMemory := container.Resources.Limits[v1.ResourceMemory]
    if !hasCPU || !hasMemory {
        return "MEDIUM"  // Partial configuration
    }
    
    // Check usage against limits
    if usage != nil && isHighUtilization(container, usage) {
        return "MEDIUM"  // Limits may be too tight
    }
    
    return "LOW"  // Properly configured
}

Implementation Insight: Three-tier risk model allows for prioritization of fixes.

3. Usage-Based Recommendations

func generateRecommendations(usage *ResourceUsage) Recommendations {
    // Calculate based on actual usage patterns
    cpuLimit := max(usage.CPU.MilliValue() * 5 / 2, 100)
    cpuRequest := max(usage.CPU.MilliValue() * 6 / 5, 50)
    
    return Recommendations{
        CPULimit:    fmt.Sprintf("%dm", cpuLimit),
        CPURequest:  fmt.Sprintf("%dm", cpuRequest),
        MemoryLimit: calculateMemoryLimit(usage.Memory),
    }
}

Algorithm Choice: The 2.5x multiplier for limits is based on:

Google's SRE book recommendations for headroom
AWS Well-Architected Framework buffer recommendations
Empirical data from production workloads

4. Graceful Degradation

if podMetrics, err := analyzer.GetPodMetrics(ctx); err != nil {
    log.Printf("Metrics unavailable: %v", err)
    // Continue with basic analysis without usage data
    results = analyzer.AnalyzeWithoutMetrics(pods)
} else {
    // Full analysis with usage data
    results = analyzer.AnalyzeWithMetrics(pods, podMetrics)
}

Design Principle: The tool should provide value even when metrics server isn't available.

Usage Examples

Basic Usage

# Check all namespaces with default settings
./pod-limit-checker

# Check specific namespace
./pod-limit-checker --namespace production

# Output in JSON for automation
./pod-limit-checker --output json | jq '.[] | select(.RiskLevel == "HIGH")'

Advanced Scenarios

# Verbose output with all details
./pod-limit-checker --namespace staging --verbose

# Generate YAML patches for automation
./pod-limit-checker --namespace kubernetes-dashboard --output yaml --quiet |   yq eval '.[0].exampleyaml'

Real-World Workflow

# 1. Initial assessment
./pod-limit-checker > initial-report.txt

# 2. Focus on high-risk namespaces
./pod-limit-checker --namespace customer-facing --verbose

# 3. Generate specific recommendations
./pod-limit-checker --output json --quiet |   jq -r '.[] | "\(.PodName) \(.ContainerName)|CPU:\(.RecommendedCPULimit)|Memory:\(.RecommendedMemoryLimit)"' |   column -t -s '|'

# 4. Apply fixes (manual step)
# Use the provided YAML examples to update deployments

Installation & Setup

Prerequisites

Go 1.21+
Kubernetes cluster v1.34.2+
Metrics Server installed
kubeconfig with appropriate RBAC permissions

Installation Steps

# Clone and build
git clone https://github.com/diablinux/pod-limit-checker.git
cd pod-limit-checker
go build -o pod-limit-checker

# Or install globally
go install ./...

# Verify installation
pod-limit-checker --help

RBAC Configuration

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-limit-checker
rules:
- apiGroups: [""]
  resources: ["pods", "namespaces"]
  verbs: ["list", "get"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods"]
  verbs: ["list", "get"]

Container Deployment

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o pod-limit-checker

FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/pod-limit-checker /usr/local/bin/
ENTRYPOINT ["pod-limit-checker"]

Future Enhancements

Planned Features

Auto-remediation Mode: Generate and apply patches automatically

./pod-limit-checker --auto-fix --dry-run
./pod-limit-checker --auto-fix --apply

Historical Analysis: Track resource usage patterns over time

type HistoricalAnalysis struct {
    Trend          string  // "increasing", "decreasing", "stable"
    PeakUsage      float64
    AverageUsage   float64
    Recommendation string
}

Cost Estimation: Calculate potential cost savings

./pod-limit-checker --cost-estimate --provider aws
# Output: Estimated monthly savings: $2,500

Integration with CI/CD: Pre-deployment validation

# GitHub Actions workflow
- name: Validate Resource Limits
  uses: pod-limit-checker/action@v1
  with:
    fail-on-high-risk: true

Multi-cluster Support: Analyze across multiple clusters

./pod-limit-checker --clusters prod,staging,dev

Industry Relevance

This project addresses several key industry trends:

FinOps: Cloud cost optimization through proper resource management
GitOps: Integration with deployment pipelines for policy enforcement
Observability: Combining configuration analysis with runtime metrics
Security: Implementing Kubernetes security best practices

References & Inspiration

Kubernetes Documentation: Resource Management Best Practices
Google SRE Book: Error budgets and resource planning
AWS Well-Architected Framework: Reliability pillar recommendations
CIS Kubernetes Benchmarks: Security controls for resource limits
Open Source Projects: kube-bench, kube-score, popeye for inspiration

"In resource management, as in life, boundaries create freedom." - Anonymous Kubernetes Admin

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cmd		cmd
k8s		k8s
pkg		pkg
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Pod Limit Checker

Table of Contents

Problem Statement & Motivation

The Challenge

Why This Project Was Created

Industry Context

Architecture & Design Decisions

High-Level Architecture

Design Decisions

Algorithm Design

Code Structure

Key Components

Key Features & Implementation Details

1. Intelligent Limit Detection

2. Risk Assessment Algorithm

3. Usage-Based Recommendations

4. Graceful Degradation

Usage Examples

Basic Usage

Advanced Scenarios

Real-World Workflow

Installation & Setup

Prerequisites

Installation Steps

RBAC Configuration

Container Deployment

Future Enhancements

Planned Features

Industry Relevance

References & Inspiration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages