Pod Limit Checker is a Go-based command-line tool designed for Kubernetes administrators to identify and fix resource limit configuration issues in their clusters. The tool scans all pods across namespaces, detects missing CPU and memory limits, analyzes current resource usage patterns, and provides intelligent, usage-based recommendations for optimal limit configuration.
-
Problem Statement & Motivation
-
Architecture & Design Decisions
-
Code Structure
-
Key Features & Implementation Details
-
Usage Examples
-
Installation & Setup
-
Future Enhancements
-
Industry Relevance
In Kubernetes production environments, one of the most common misconfigurations is the absence of resource limits on pods. This leads to several critical issues:
-
Resource Starvation: A single pod without limits can consume all available node resources, causing other pods to be evicted or fail
-
Unpredictable Performance: Without limits, pods can experience variable performance depending on what else is running on the node
-
Cost Inefficiency: In cloud environments, unconstrained resource usage leads to unnecessary costs
-
Security Risks: Resource exhaustion attacks become easier when limits aren't enforced
As a Kubernetes cluster administrator, I frequently encountered:
-
Production incidents caused by runaway resource consumption
-
Difficulty identifying which pods lacked limits in large clusters
-
The need for data-driven recommendations rather than guesswork
-
Lack of tools that combined limit detection with usage-based analysis
According to the Kubernetes best practices and the 12-factor app methodology, applications should declare their resource requirements. Major cloud providers (AWS, GCP, Azure) and Kubernetes security frameworks like CIS Kubernetes Benchmarks specifically require resource limits as a security control.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β K8s API β β Metrics API β β Configuration β
β - Pods ββββββ€ - Usage ββββββ€ - kubeconfig β
β - Namespaces β β - Metrics β β - Flags β
ββββββββββ¬βββββββββ βββββββββββ¬ββββββββ ββββββββββ¬βββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β Pod Limit Checker β
β ββββββββββββββββββββββββββ β
β β Analyzer β β
β β - Resource Analysis β β
β β - Risk Assessment β β
β β - Recommendations β β
β ββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββ β
β β Reporter β β
β β - Table Output β β
β β - JSON/YAML Export β β
β β - Verbose Details β β
β ββββββββββββββββββββββββββ β
ββββββββββββββββ¬βββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β Output Formats β
β βββββββ βββββββ βββββββ β
β βTableβ βJSON β βYAML β β
β βββββββ βββββββ βββββββ β
βββββββββββββββββββββββββββββββ-
Modular Architecture: Separated into analyzer, reporter, and Kubernetes client packages for testability and maintainability
-
Idempotent Operations: The tool only reads data, never modifies cluster state
-
Progressive Enhancement: Works with or without metrics server, providing appropriate suggestions for each scenario
-
Human-Readable Output: Uses emojis and color-coded risk levels for quick visual assessment
-
Programmable Interface: JSON/YAML output for integration with other tools or automation pipelines
The recommendation algorithm is based on the following formula:
Recommended Limit = Current Usage Γ Multiplier + Minimum Buffer
Where:
- CPU Limit Multiplier: 2.5x
- CPU Request Multiplier: 1.2x
- Memory Limit Multiplier: 2.5x
- Memory Request Multiplier: 1.2x
- Minimum CPU Limit: 100m
- Minimum CPU Request: 50m
- Minimum Memory Limit: 128Mi
- Minimum Memory Request: 64Mi
This algorithm is derived from industry best practices:
-
2.5x multiplier for limits: Provides buffer for traffic spikes while preventing resource exhaustion
-
1.2x multiplier for requests: Ensures baseline performance without over-provisioning
-
Minimum values: Prevent unreasonably small limits that could cause pod eviction
pod-limit-checker/
βββ main.go # Entry point
βββ cmd/
β βββ check.go # Command-line interface and flag parsing
βββ pkg/
β βββ kubernetes/
β β βββ client.go # K8s API client initialization
β βββ analyzer/
β β βββ analyzer.go # Core analysis logic
β βββ reporter/
β βββ reporter.go # Output formatting and reporting
βββ go.mod # Dependency management
βββ README.md # User documentationmain.go- Application Entry Point
-
Minimal main function that delegates to command execution
-
Error handling and exit code management
cmd/check.go- CLI Implementation
-
Flag parsing with comprehensive options
-
Context management with timeout for API calls
-
Error propagation with user-friendly messages
-
Kubeconfig resolution with fallback to default locations
pkg/kubernetes/client.go- API Integration
-
Dual client setup: Core API + Metrics API
-
Config loading from kubeconfig or in-cluster configuration
-
Error handling for connectivity issues
-
Cross-platform support for Windows/Linux/macOS
pkg/analyzer/analyzer.go- Business Logic
-
Pod analysis with risk level calculation
-
Resource calculation using Kubernetes resource.Quantity
-
Usage-based recommendations with intelligent algorithms
-
State management for different analysis scenarios
pkg/reporter/reporter.go- Output Management
-
Multi-format output (table, JSON, YAML)
-
Progressive disclosure Lessons Learnedwith verbose mode
-
Visual indicators with emojis and formatting
-
Context-aware suggestions based on available data
// Detects if any resource limits are missing
func hasMissingLimits(container v1.Container) bool {
_, hasCPU := container.Resources.Limits[v1.ResourceCPU]
_, hasMemory := container.Resources.Limits[v1.ResourceMemory]
return !hasCPU || !hasMemory
}Why this matters: Some pods have partial limits (e.g., CPU but no memory), which is still a risk.
func calculateRiskLevel(container v1.Container, usage *ResourceUsage) string {
if len(container.Resources.Limits) == 0 {
return "HIGH" // No limits at all
}
// Check for partial limits
_, hasCPU := container.Resources.Limits[v1.ResourceCPU]
_, hasMemory := container.Resources.Limits[v1.ResourceMemory]
if !hasCPU || !hasMemory {
return "MEDIUM" // Partial configuration
}
// Check usage against limits
if usage != nil && isHighUtilization(container, usage) {
return "MEDIUM" // Limits may be too tight
}
return "LOW" // Properly configured
}Implementation Insight: Three-tier risk model allows for prioritization of fixes.
func generateRecommendations(usage *ResourceUsage) Recommendations {
// Calculate based on actual usage patterns
cpuLimit := max(usage.CPU.MilliValue() * 5 / 2, 100)
cpuRequest := max(usage.CPU.MilliValue() * 6 / 5, 50)
return Recommendations{
CPULimit: fmt.Sprintf("%dm", cpuLimit),
CPURequest: fmt.Sprintf("%dm", cpuRequest),
MemoryLimit: calculateMemoryLimit(usage.Memory),
}
}Algorithm Choice: The 2.5x multiplier for limits is based on:
-
Google's SRE book recommendations for headroom
-
AWS Well-Architected Framework buffer recommendations
-
Empirical data from production workloads
if podMetrics, err := analyzer.GetPodMetrics(ctx); err != nil {
log.Printf("Metrics unavailable: %v", err)
// Continue with basic analysis without usage data
results = analyzer.AnalyzeWithoutMetrics(pods)
} else {
// Full analysis with usage data
results = analyzer.AnalyzeWithMetrics(pods, podMetrics)
}Design Principle: The tool should provide value even when metrics server isn't available.
# Check all namespaces with default settings
./pod-limit-checker
# Check specific namespace
./pod-limit-checker --namespace production
# Output in JSON for automation
./pod-limit-checker --output json | jq '.[] | select(.RiskLevel == "HIGH")'# Verbose output with all details
./pod-limit-checker --namespace staging --verbose
# Generate YAML patches for automation
./pod-limit-checker --namespace kubernetes-dashboard --output yaml --quiet | yq eval '.[0].exampleyaml'# 1. Initial assessment
./pod-limit-checker > initial-report.txt
# 2. Focus on high-risk namespaces
./pod-limit-checker --namespace customer-facing --verbose
# 3. Generate specific recommendations
./pod-limit-checker --output json --quiet | jq -r '.[] | "\(.PodName) \(.ContainerName)|CPU:\(.RecommendedCPULimit)|Memory:\(.RecommendedMemoryLimit)"' | column -t -s '|'
# 4. Apply fixes (manual step)
# Use the provided YAML examples to update deployments-
Go 1.21+
-
Kubernetes cluster v1.34.2+
-
Metrics Server installed
-
kubeconfig with appropriate RBAC permissions
# Clone and build
git clone https://github.com/diablinux/pod-limit-checker.git
cd pod-limit-checker
go build -o pod-limit-checker
# Or install globally
go install ./...
# Verify installation
pod-limit-checker --helpapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-limit-checker
rules:
- apiGroups: [""]
resources: ["pods", "namespaces"]
verbs: ["list", "get"]
- apiGroups: ["metrics.k8s.io"]
resources: ["pods"]
verbs: ["list", "get"]FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o pod-limit-checker
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/pod-limit-checker /usr/local/bin/
ENTRYPOINT ["pod-limit-checker"]- Auto-remediation Mode: Generate and apply patches automatically
./pod-limit-checker --auto-fix --dry-run
./pod-limit-checker --auto-fix --apply- Historical Analysis: Track resource usage patterns over time
type HistoricalAnalysis struct {
Trend string // "increasing", "decreasing", "stable"
PeakUsage float64
AverageUsage float64
Recommendation string
}- Cost Estimation: Calculate potential cost savings
./pod-limit-checker --cost-estimate --provider aws
# Output: Estimated monthly savings: $2,500
- Integration with CI/CD: Pre-deployment validation
# GitHub Actions workflow
- name: Validate Resource Limits
uses: pod-limit-checker/action@v1
with:
fail-on-high-risk: true
- Multi-cluster Support: Analyze across multiple clusters
./pod-limit-checker --clusters prod,staging,devThis project addresses several key industry trends:
-
FinOps: Cloud cost optimization through proper resource management
-
GitOps: Integration with deployment pipelines for policy enforcement
-
Observability: Combining configuration analysis with runtime metrics
-
Security: Implementing Kubernetes security best practices
-
Kubernetes Documentation: Resource Management Best Practices
-
Google SRE Book: Error budgets and resource planning
-
AWS Well-Architected Framework: Reliability pillar recommendations
-
CIS Kubernetes Benchmarks: Security controls for resource limits
-
Open Source Projects: kube-bench, kube-score, popeye for inspiration
"In resource management, as in life, boundaries create freedom." - Anonymous Kubernetes Admin