=========================
Modern GitOps deployment structure using Talos OS, ArgoCD, and Cilium, with Proxmox virtualization
A GitOps-driven Kubernetes cluster using Talos OS (secure, immutable Linux for K8s), ArgoCD, and Cilium, with integrated Cloudflare Tunnel for secure external access. Built for both home lab and small production environments.
- Prerequisites
- Architecture
- Quick Start
- Verification
- Documentation
- Hardware Stack
- Scaling
- Troubleshooting
- Contributing
- License
- Proxmox VMs or bare metal (see hardware below)
- Domain configured in Cloudflare
- 1Password account for secrets management
- Talosctl and Talhelper installed
kubectl
installed locallycloudflared
installed locally
graph TD
subgraph "Argo CD Projects"
IP[Infrastructure Project] --> IAS[Infrastructure ApplicationSet]
MP[Monitoring Project] --> MAS[Monitoring ApplicationSet]
AP[Applications Project] --> AAS[Applications ApplicationSet]
AIP[AI Project] --> AIAS[AI ApplicationSet]
end
subgraph "Infrastructure Components"
IAS --> N[Networking]
IAS --> S[Storage]
IAS --> C[Controllers]
IAS --> DB[Database]
N --> Cilium
N --> Cloudflared
N --> Gateway
S --> Longhorn
S --> VolumeSnapshots
C --> CertManager
C --> ExternalSecrets
DB --> CloudNativePG
end
subgraph "Monitoring Stack"
MAS --> Prometheus
MAS --> Grafana
MAS --> AlertManager
MAS --> Loki
end
subgraph "User Applications"
AAS --> Home[Home Apps]
AAS --> Media[Media Apps]
AAS --> Dev[Dev Tools]
AAS --> Privacy[Privacy Apps]
Home --> Frigate
Home --> WyzeBridge
Media --> Plex
Media --> Jellyfin
Dev --> Kafka
Dev --> Temporal
Privacy --> SearXNG
Privacy --> LibReddit
end
subgraph "AI Applications"
AIAS --> Ollama
AIAS --> ComfyUI
end
style IP fill:#f9f,stroke:#333,stroke-width:2px
style AP fill:#f9f,stroke:#333,stroke-width:2px
style MP fill:#f9f,stroke:#333,stroke-width:2px
style AIP fill:#f9f,stroke:#333,stroke-width:2px
style IAS fill:#bbf,stroke:#333,stroke-width:2px
style MAS fill:#bbf,stroke:#333,stroke-width:2px
style AAS fill:#bbf,stroke:#333,stroke-width:2px
style AIAS fill:#bbf,stroke:#333,stroke-width:2px
- Three-Tier Architecture: Separate infrastructure, monitoring, and applications
- Sync Waves: Controlled deployment order via ArgoCD
- Declarative GitOps: All cluster state managed in Git
- GPU Integration: Full NVIDIA GPU support via Talos system extensions and GPU Operator
- Zero SSH: All node management via Talosctl API
# On your workstation
brew install talosctl sops yq kubectl
brew install budimanjojo/tap/talhelper
# Or see Talos/Talhelper docs for Linux/Windows
cd iac/talos
# Edit talconfig.yaml for your cluster topology
# Generate secrets (encrypted with SOPS)
talhelper gensecret > talsecret.sops.yaml
sops -e -i talsecret.sops.yaml
# Generate node configs
talhelper genconfig
- Boot each VM/host with the generated Talos
machine.yaml
(PXE, ISO, or cloud-init) - Use
talosctl
to bootstrap the control plane:
# Set kubeconfig and talosconfig env vars
export TALOSCONFIG=./clusterconfig/talosconfig
export KUBECONFIG=./clusterconfig/kubeconfig
# Bootstrap the cluster
# (Run ONCE, on a single control plane node)
talosctl bootstrap --nodes <control-plane-ip>
# Apply config to all nodes
talosctl apply-config --insecure --nodes <node-ip> --file clusterconfig/<node>.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/experimental-install.yaml
With the CRDs in place, we can now bootstrap Argo CD. This is a two-step process.
First, we deploy Argo CD itself. This Application
manifest tells Argo CD how to manage its own installation and upgrades directly from this Git repository. This is the "app of apps" pattern.
# Apply the Argo CD application. It will self-manage from this point on.
kubectl apply -f infrastructure/argocd-app.yaml
Second, we deploy the root ApplicationSet. This ApplicationSet
automatically discovers and deploys all the other ApplicationSets in this repository (for infrastructure, monitoring, etc.), creating a fully GitOps-driven deployment.
# Apply the root ApplicationSet. This will deploy everything else.
kubectl apply -f infrastructure/root-appset.yaml
From this point on, every component of your cluster is managed via Git. Any changes pushed to the main
branch will be automatically synced by Argo CD.
# Create required namespaces
kubectl create namespace 1passwordconnect
kubectl create namespace external-secrets
# Generate and apply 1Password Connect credentials
# This command creates 1password-credentials.json
op connect server create
export CONNECT_TOKEN="your-1password-connect-token"
# Create required secrets
kubectl create secret generic 1password-credentials \
--from-file=1password-credentials.json=1password-credentials.base64 \
--namespace 1passwordconnect
kubectl create secret generic 1password-operator-token \
--from-literal=token=$CONNECT_TOKEN \
--namespace 1passwordconnect
kubectl create secret generic 1passwordconnect \
--from-literal=token=$CONNECT_TOKEN \
--namespace external-secrets
- Three-tier architecture separating infrastructure, monitoring, and applications
- Sync waves ensure proper deployment order
- Simple directory patterns without complex include/exclude logic
- All components managed through just three top-level ApplicationSets
- No SSH: All management via
talosctl
API - Immutable OS: No package manager, no shell
- Declarative: All config in Git, applied via Talhelper/Talosctl
- System Extensions: GPU, storage, and other drivers enabled via config
- SOPS: Used for encrypting Talos secrets
- No plaintext secrets in Git
# Check Talos node health
talosctl health --nodes <node-ip>
# Check Kubernetes core components
kubectl get pods -A
cilium status
# Check ArgoCD
kubectl get application -A
kubectl get pods -n argocd
# Check secrets
kubectl get pods -n 1passwordconnect
kubectl get externalsecret -A
- View Documentation Online - Full documentation website
- Local Documentation - Browse documentation in the repository:
π§ Compute
βββ AMD Threadripper 2950X (16c/32t)
βββ 128GB ECC DDR4 RAM
βββ 2Γ NVIDIA RTX 3090 24GB
βββ Google Coral TPU
πΎ Storage
βββ 4TB ZFS RAID-Z2
βββ NVMe OS Drive
βββ Longhorn/Local Path Storage for K8s
π Network
βββ 2.5Gb Networking
βββ Firewalla Gold
βββ Internal DNS Resolution
While this setup uses a single node, you can add worker nodes for additional compute capacity:
Scaling Type | Description | Benefits |
---|---|---|
Single Node | All workloads on one server | Simplified storage, easier management |
Worker Nodes | Add compute-only nodes | Increased capacity without storage complexity |
Multi-Master | High availability control plane | Production-grade resilience |
.
βββ infrastructure/ # Infrastructure components
β βββ controllers/ # Kubernetes controllers
β β βββ argocd/ # ArgoCD configuration and projects
β βββ networking/ # Network configurations
β βββ storage/ # Storage configurations
β βββ infrastructure-components-appset.yaml # Main infrastructure ApplicationSet
βββ monitoring/ # Monitoring components
β βββ k8s-monitoring/ # Kubernetes monitoring stack
β βββ monitoring-components-appset.yaml # Main monitoring ApplicationSet
βββ my-apps/ # User applications
β βββ ai/ # AI-related applications
β βββ media/ # Media applications
β βββ development/ # Development tools
β βββ external/ # External service integrations
β βββ home/ # Home automation apps
β βββ privacy/ # Privacy-focused applications
β βββ myapplications-appset.yaml # Main applications ApplicationSet
βββ docs/ # Documentation
β βββ argocd.md # ArgoCD setup and workflow
β βββ network.md # Network configuration
β βββ security.md # Security setup
β βββ storage.md # Storage configuration
β βββ external-services.md # External services setup
Issue Type | Troubleshooting Steps |
---|---|
Talos Node Issues | β’ talosctl health β’ Check Talos logs: talosctl logs -n <node-ip> -k |
Network Issues | β’ Check Cilium status β’ Verify Gateway API β’ Test DNS resolution |
Storage Issues | β’ Verify PV binding β’ Check Longhorn/Local PV logs β’ Validate node affinity |
ArgoCD Issues | β’ Check application sync status β’ Review application logs |
Secrets Issues | β’ Check External Secrets Operator logs β’ Verify 1Password Connect status |
GPU Issues | β’ Check GPU node labels β’ Verify NVIDIA Operator pods β’ Check nvidia-smi on GPU nodes |
If you need to remove all existing applications to rebuild:
# Remove finalizers from all applications
kubectl get applications -n argocd -o name | xargs -I{} kubectl patch {} -n argocd --type json -p '[{"op": "remove","path": "/metadata/finalizers"}]'
# Delete all applications
kubectl delete applications --all -n argocd
# For stuck ApplicationSets
kubectl get applicationsets -n argocd -o name | xargs -I{} kubectl patch {} -n argocd --type json -p '[{"op": "remove","path": "/metadata/finalizers"}]'
kubectl delete applicationsets --all -n argocd
# Only then apply the new structure in order
kubectl apply -f infrastructure/argocd-app.yaml
kubectl apply -f infrastructure/root-appset.yaml
- Fork the repository
- Create a feature branch
- Submit a pull request
MIT License - See LICENSE for details