|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +section-type: post |
| 4 | +title: 'Building a Talos Kubernetes Homelab with Terraform on Proxmox' |
| 5 | +category: tech |
| 6 | +tags: ['homelab', 'talos', 'kubernetes', 'terraform', 'proxmox', 'devops'] |
| 7 | +--- |
| 8 | + |
| 9 | +# Trying to automate Talos Linux deployment with Terraform |
| 10 | + |
| 11 | +So I heard about Talos Linux from the interwebs and figured I'd give it a shot in my homelab. Something about an immutable OS just for Kubernetes? Sure, why not. |
| 12 | + |
| 13 | +My goal: get a Talos cluster running on Proxmox using only Terraform. No manual clicking around, I'm too lazy. Should be simple enough, just like my neovim config. (which I use btw) |
| 14 | + |
| 15 | +## What Even Is Talos Linux? |
| 16 | + |
| 17 | +[Talos Linux](https://www.talos.dev/) is basically a stripped-down Linux that only runs Kubernetes. The weird parts: |
| 18 | + |
| 19 | +- [No SSH access](https://www.talos.dev/v1.10/introduction/what-is-talos/#design-principles) (takes some getting used to in a homelab where you want to control everything) |
| 20 | +- [Read-only filesystem](https://www.talos.dev/v1.10/introduction/what-is-talos/#immutable) |
| 21 | +- Everything happens through [APIs](https://www.talos.dev/v1.10/learn-more/talosctl/) |
| 22 | +- More secure because there's less to attack |
| 23 | + |
| 24 | +The basic idea: less stuff = less problems. Makes sense, I guess. (Unless it's Neovim) |
| 25 | + |
| 26 | +## My Setup |
| 27 | + |
| 28 | +Running this on a Proxmox server in my basement. Nothing fancy: |
| 29 | + |
| 30 | +- 1 control plane node (4 CPU, 8GB RAM) |
| 31 | +- 2 worker nodes (4 CPU, 12GB RAM each) |
| 32 | +- Static IPs because DHCP in my homelab is... flaky |
| 33 | + |
| 34 | +Terraform for everything because clicking through UIs gets old. |
| 35 | + |
| 36 | +## The Terraform Approach I Settled On |
| 37 | + |
| 38 | +After trying a bunch of different approaches (and failing), here's what worked: |
| 39 | + |
| 40 | +- Proxmox provider for creating VMs |
| 41 | +- [Talos provider](https://registry.terraform.io/providers/siderolabs/talos/latest/docs) for configuring the cluster |
| 42 | +- OnePassword provider for secrets (I use 1Password for everything) |
| 43 | +- Local provider for saving kubeconfig files |
| 44 | + |
| 45 | +Not saying this is the best way. Just what finally worked. |
| 46 | + |
| 47 | +## The Version Gotcha That Cost Me Hours |
| 48 | + |
| 49 | +This one burned me: Talos changed their image types between versions. Version 1.7.x has "nocloud" images that work with cloud-init. Version 1.8.0+? They switched to ["metal" images](https://www.talos.dev/v1.10/advanced/metal-network-configuration/) that don't. |
| 50 | + |
| 51 | +Why does this matter? If you want static IPs through Terraform, you need [cloud-init support](https://www.talos.dev/v1.7/talos-guides/install/cloud-platforms/). Without it, you're stuck doing manual network config. So much for automation. |
| 52 | + |
| 53 | +Sticking with 1.7.6: |
| 54 | + |
| 55 | +```hcl |
| 56 | +variable "talos_version" { |
| 57 | + description = "Talos Linux version to use" |
| 58 | + type = string |
| 59 | + default = "1.7.6" # Latest 1.7.x for nocloud support |
| 60 | +} |
| 61 | +``` |
| 62 | +*[View in repo](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/variables.tf#L21-L25)* |
| 63 | + |
| 64 | +There's probably a better way with newer versions. Haven't found it. |
| 65 | + |
| 66 | +## The Template Approach |
| 67 | + |
| 68 | +I split my Terraform into modules. The template module creates a reusable VM template: |
| 69 | + |
| 70 | +```hcl |
| 71 | +module "talos_template" { |
| 72 | + source = "../../modules/talos-template" |
| 73 | +
|
| 74 | + template_vm_id = 9200 |
| 75 | + talos_version = "1.7.6" |
| 76 | + proxmox_node = "proxmox" |
| 77 | + vm_storage_pool = "local-lvm" # Adjust for your Proxmox storage |
| 78 | +} |
| 79 | +``` |
| 80 | +*[View template module](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox/modules/talos-template)* |
| 81 | + |
| 82 | +The node module clones from the template: |
| 83 | + |
| 84 | +```hcl |
| 85 | +module "control_plane" { |
| 86 | + source = "../../modules/talos-node" |
| 87 | +
|
| 88 | + template_vm_id = module.talos_template.template_id |
| 89 | + node_name = "homelab-cp" |
| 90 | + node_role = "controlplane" |
| 91 | + vm_id = 200 |
| 92 | +
|
| 93 | + # Static IP configuration |
| 94 | + ip_address = "10.0.20.10" |
| 95 | + subnet_mask = 24 |
| 96 | + gateway = "10.0.20.1" |
| 97 | +
|
| 98 | + # Required Talos configuration |
| 99 | + cluster_name = var.cluster_name |
| 100 | + talos_client_config = talos_machine_secrets.cluster.client_configuration |
| 101 | + machine_config = data.talos_machine_configuration.control_plane.machine_configuration |
| 102 | +} |
| 103 | +``` |
| 104 | +*[View node module](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox/modules/talos-node)* |
| 105 | + |
| 106 | +Works reliably enough. |
| 107 | + |
| 108 | +## The File Upload Problem |
| 109 | + |
| 110 | +The Talos images are 1.2GB compressed. Uploading through the Proxmox API? Timeouts. Failures. Pain. |
| 111 | + |
| 112 | +I ended up using SSH and rsync: |
| 113 | + |
| 114 | +```bash |
| 115 | +# Download and decompress locally |
| 116 | +curl -fsSL "https://github.com/siderolabs/talos/releases/download/v${var.talos_version}/nocloud-amd64.raw.xz" \ |
| 117 | + -o "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz" |
| 118 | +xz -d "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz" |
| 119 | + |
| 120 | +# Upload via rsync with progress |
| 121 | +rsync -avz --progress "/tmp/talos-${var.talos_version}-nocloud-amd64.raw" \ |
| 122 | + root@${var.proxmox_node}:/tmp/ |
| 123 | +``` |
| 124 | +*[View full upload logic](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/main.tf#L58-L127)* |
| 125 | + |
| 126 | +Not pretty but it works. |
| 127 | + |
| 128 | +## Getting kubectl and talosctl Working |
| 129 | + |
| 130 | +Got the Terraform to automatically configure [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [talosctl](https://www.talos.dev/v1.10/learn-more/talosctl/). It merges the new cluster config with my existing kubeconfig: |
| 131 | + |
| 132 | +```hcl |
| 133 | +resource "null_resource" "kubeconfig_merge" { |
| 134 | + provisioner "local-exec" { |
| 135 | + command = <<-EOT |
| 136 | + # Backup existing config (learned this one the hard way) |
| 137 | + cp ~/.kube/config ~/.kube/config.backup.$(date +%Y%m%d_%H%M%S) |
| 138 | +
|
| 139 | + # Merge configs |
| 140 | + KUBECONFIG="$HOME/.kube/config:${path.root}/kubeconfig" \ |
| 141 | + kubectl config view --flatten > /tmp/kubeconfig.merged |
| 142 | +
|
| 143 | + # Validate merge worked, then replace |
| 144 | + if kubectl --kubeconfig=/tmp/kubeconfig.merged config get-contexts | grep -q "${var.cluster_name}"; then |
| 145 | + mv /tmp/kubeconfig.merged ~/.kube/config |
| 146 | + fi |
| 147 | + EOT |
| 148 | + } |
| 149 | +} |
| 150 | +``` |
| 151 | +*[View full kubeconfig merge logic](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/environments/dev/main.tf#L295-L388)* |
| 152 | + |
| 153 | +That backup step? Added after I nuked my kubeconfig. Don't skip it. |
| 154 | + |
| 155 | +## What's Running on It Now |
| 156 | + |
| 157 | +Got a pretty full stack running now: |
| 158 | + |
| 159 | +- **[MetalLB](https://metallb.universe.tf/)** for load balancer services (bare metal needs help with LoadBalancer types) |
| 160 | +- **[Prometheus](https://prometheus.io/)** and **[Grafana](https://grafana.com/)** for monitoring (because you can't manage what you can't see) |
| 161 | +- **[cert-manager](https://cert-manager.io/)** for SSL certificates (Let's Encrypt automation) |
| 162 | +- **[Istio](https://istio.io/)** service mesh (probably overkill but it's cool) |
| 163 | +- **[CloudNativePG](https://cloudnative-pg.io/)** for PostgreSQL (way better than managing databases manually) |
| 164 | +- **[Local Path Provisioner](https://github.com/rancher/local-path-provisioner)** for storage (just local volumes but it works) |
| 165 | + |
| 166 | +Turns out once you get the base cluster working, adding stuff with Helm is pretty straightforward. |
| 167 | + |
| 168 | +## Performance Tweaks I Found |
| 169 | + |
| 170 | +Made a few tweaks based on random blog posts. Not sure if they help, but they don't hurt: |
| 171 | + |
| 172 | +```hcl |
| 173 | +# CPU type for better container performance (supposedly) |
| 174 | +cpu_type = "x86-64-v2-AES" |
| 175 | +
|
| 176 | +# Disable memory ballooning |
| 177 | +balloon = 0 |
| 178 | +``` |
| 179 | +*[View performance configs](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/main.tf#L112-L116)* |
| 180 | + |
| 181 | +The memory ballooning thing: Proxmox tries to be clever about memory but it can mess with Kubernetes. |
| 182 | + |
| 183 | +## Things That Take Getting Used To |
| 184 | + |
| 185 | +Talos does things differently than traditional Linux: |
| 186 | + |
| 187 | +- No SSH access feels wrong at first. Yeah, security, but debugging requires learning new approaches |
| 188 | +- [Certificate stuff](https://www.talos.dev/v1.10/kubernetes-guides/configuration/certificate-rotation/) just... happens automatically. Nice but takes trust |
| 189 | +- Immutable filesystem is cool in theory but changes how you think about troubleshooting |
| 190 | + |
| 191 | +## What I'd Do Differently |
| 192 | + |
| 193 | +Next time: |
| 194 | + |
| 195 | +- Start with one node to understand Talos first |
| 196 | +- Actually read the docs before diving in |
| 197 | +- Maybe try [k3s](https://k3s.io/) first since it's simpler |
| 198 | +- Set up monitoring from the start |
| 199 | + |
| 200 | +## Is This Overkill for a Homelab? |
| 201 | + |
| 202 | +Absolutely. Could do most of this with [k3s](https://k3s.io/) or [Docker Compose](https://docs.docker.com/compose/). But where's the fun in that? |
| 203 | + |
| 204 | +## Wrapping Up |
| 205 | + |
| 206 | +Getting Talos running with Terraform was more work than expected. Version issues, file upload problems, and Talos complexity made this a multi-weekend project. |
| 207 | + |
| 208 | +But it works! Can tear down and rebuild with [`terraform apply`](https://developer.hashicorp.com/terraform/cli/commands/apply). Not perfect, probably better ways to do it, but it's mine. |
| 209 | + |
| 210 | +If you try something similar: |
| 211 | +- Start simple |
| 212 | +- Talos [1.7.x vs 1.8.x](https://www.talos.dev/v1.8/introduction/changelog-1.8/) matters for cloud-init |
| 213 | +- Back up your kubeconfig |
| 214 | +- Expect frustration |
| 215 | + |
| 216 | +## Helpful Resources |
| 217 | + |
| 218 | +Docs that actually helped: |
| 219 | + |
| 220 | +- [Talos Linux Documentation](https://www.talos.dev/docs/) - Pretty good once you get the basics |
| 221 | +- [Proxmox Provider Docs](https://registry.terraform.io/providers/bpg/proxmox/latest/docs) - Essential |
| 222 | +- [Talos Provider Docs](https://registry.terraform.io/providers/siderolabs/talos/latest/docs) - Sparse but necessary |
| 223 | +- [Complete Terraform Code](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox) - The full working implementation |
| 224 | + |
| 225 | +If you build something similar, let me know what worked. Still figuring this out. |
| 226 | + |
0 commit comments