π― Objective
Deploy multi-architecture NodePool infrastructure to support hephy-builder CI/CD workloads without disrupting existing cluster operations.
π Current Infrastructure Challenge
Our current CI/CD builds are limited by single-architecture runner availability. To enable true multi-arch builds (AMD64 + ARM64), we need dedicated infrastructure that:
- Supports both architectures efficiently
- Isolates CI/CD workloads from production applications
- Uses cost-effective spot instances for ephemeral builds
- Scales automatically based on demand
ποΈ Proposed Solution: Multi-Architecture Spot NodePool
Infrastructure Design
# New NodePool: multiarch-spot
spec:
disruption:
consolidateAfter: 30s # Fast consolidation for ephemeral builds
consolidationPolicy: WhenEmpty
template:
metadata:
labels:
lifecycle: Ec2Spot
intent: cicd-builds
spec:
taints:
- key: cicd-builds
value: "true"
effect: NoSchedule
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: multiarch-spot-nodeclass
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m6i.large", "m6i.xlarge", "m6a.large", "m6a.xlarge"]
π§ Implementation Plan
Phase 1: Infrastructure Deployment
Phase 2: GitLab Runner Deployment
Phase 3: Pipeline Integration
Phase 4: Production Readiness
π Technical Specifications
NodePool Configuration
- Capacity Type: Spot instances (cost optimization)
- Architectures: AMD64 + ARM64 dual support
- Instance Types: m6i/m6a family (balanced compute/memory)
- Scaling: Automatic based on demand
- Taints: Dedicated for CI/CD workloads
Security & Isolation
- Network Isolation: Same VPC, isolated subnet (optional)
- Workload Isolation: Taints prevent non-CI workload scheduling
- IAM Permissions: Minimal required permissions for ECR/S3 access
- Security Groups: Restricted access for build operations
Cost Optimization
- Spot Instances: 60-70% cost savings vs on-demand
- Fast Consolidation: 30s empty node termination
- Right-sizing: Build-optimized instance types
- Auto-scaling: Zero cost when no builds running
π― Success Criteria
Functional Requirements
Operational Requirements
Performance Targets
- Node Provisioning: < 2 minutes from request to ready
- Build Performance: Comparable to current single-arch builds
- Cost Efficiency: < 70% of on-demand equivalent costs
- Availability: 99%+ uptime for build operations
π Dependencies & Prerequisites
Infrastructure Access
- Crossplane cluster with NodePool management permissions
- AWS IAM permissions for Karpenter operations
- VPC and subnet configuration for additional nodes
- ECR registry access for both architectures
Configuration Files
- Update
multiarch-spot-nodepool.yaml with final specifications
- Create GitLab runner deployment manifests
- Configure runner authentication and ECR credentials
Testing Infrastructure
- Sample multi-arch builds for validation
- Monitoring and observability stack
- Cost tracking and reporting tools
π Reference Materials
Existing Configuration
multiarch-spot-nodepool.yaml: Current NodePool specification draft
attic/crossplane-node-pool-objects.yaml: Reference implementation
attic/gitlab-runner-*.yaml: Runner deployment examples
Documentation
- Karpenter NodePool configuration guide
- GitLab runner Kubernetes deployment
- AWS Spot instance best practices
Priority: High - This infrastructure enables the core multi-architecture vision of hephy-builder and removes current build limitations.
Timeline: Target completion within 1-2 weeks for full multi-arch CI/CD capability.
π― Objective
Deploy multi-architecture NodePool infrastructure to support hephy-builder CI/CD workloads without disrupting existing cluster operations.
π Current Infrastructure Challenge
Our current CI/CD builds are limited by single-architecture runner availability. To enable true multi-arch builds (AMD64 + ARM64), we need dedicated infrastructure that:
ποΈ Proposed Solution: Multi-Architecture Spot NodePool
Infrastructure Design
π§ Implementation Plan
Phase 1: Infrastructure Deployment
multiarch-spot-nodeclassEC2NodeClassmultiarch-spotNodePool with proper taints/tolerationsPhase 2: GitLab Runner Deployment
redacted-sandbox-amd64tagredacted-sandbox-arm64tagPhase 3: Pipeline Integration
Phase 4: Production Readiness
π Technical Specifications
NodePool Configuration
Security & Isolation
Cost Optimization
π― Success Criteria
Functional Requirements
Operational Requirements
Performance Targets
π Dependencies & Prerequisites
Infrastructure Access
Configuration Files
multiarch-spot-nodepool.yamlwith final specificationsTesting Infrastructure
π Reference Materials
Existing Configuration
multiarch-spot-nodepool.yaml: Current NodePool specification draftattic/crossplane-node-pool-objects.yaml: Reference implementationattic/gitlab-runner-*.yaml: Runner deployment examplesDocumentation
Priority: High - This infrastructure enables the core multi-architecture vision of hephy-builder and removes current build limitations.
Timeline: Target completion within 1-2 weeks for full multi-arch CI/CD capability.