Skip to content

Commit 91e79cf

Browse files
committed
feat: add new blog post about Talos Kubernetes homelab
Add comprehensive guide on building a Talos Kubernetes homelab on Proxmox using Terraform infrastructure as code.
1 parent 777f164 commit 91e79cf

File tree

5 files changed

+250
-0
lines changed

5 files changed

+250
-0
lines changed
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
layout: post
3+
section-type: post
4+
title: 'Building a Talos Kubernetes Homelab with Terraform on Proxmox'
5+
category: tech
6+
tags: ['homelab', 'talos', 'kubernetes', 'terraform', 'proxmox', 'devops']
7+
---
8+
9+
# Trying to automate Talos Linux deployment with Terraform
10+
11+
So I heard about Talos Linux from the interwebs and figured I'd give it a shot in my homelab. Something about an immutable OS just for Kubernetes? Sure, why not.
12+
13+
My goal: get a Talos cluster running on Proxmox using only Terraform. No manual clicking around, I'm too lazy. Should be simple enough, just like my neovim config. (which I use btw)
14+
15+
## What Even Is Talos Linux?
16+
17+
[Talos Linux](https://www.talos.dev/) is basically a stripped-down Linux that only runs Kubernetes. The weird parts:
18+
19+
- [No SSH access](https://www.talos.dev/v1.10/introduction/what-is-talos/#design-principles) (takes some getting used to in a homelab where you want to control everything)
20+
- [Read-only filesystem](https://www.talos.dev/v1.10/introduction/what-is-talos/#immutable)
21+
- Everything happens through [APIs](https://www.talos.dev/v1.10/learn-more/talosctl/)
22+
- More secure because there's less to attack
23+
24+
The basic idea: less stuff = less problems. Makes sense, I guess. (Unless it's Neovim)
25+
26+
## My Setup
27+
28+
Running this on a Proxmox server in my basement. Nothing fancy:
29+
30+
- 1 control plane node (4 CPU, 8GB RAM)
31+
- 2 worker nodes (4 CPU, 12GB RAM each)
32+
- Static IPs because DHCP in my homelab is... flaky
33+
34+
Terraform for everything because clicking through UIs gets old.
35+
36+
## The Terraform Approach I Settled On
37+
38+
After trying a bunch of different approaches (and failing), here's what worked:
39+
40+
- Proxmox provider for creating VMs
41+
- [Talos provider](https://registry.terraform.io/providers/siderolabs/talos/latest/docs) for configuring the cluster
42+
- OnePassword provider for secrets (I use 1Password for everything)
43+
- Local provider for saving kubeconfig files
44+
45+
Not saying this is the best way. Just what finally worked.
46+
47+
## The Version Gotcha That Cost Me Hours
48+
49+
This one burned me: Talos changed their image types between versions. Version 1.7.x has "nocloud" images that work with cloud-init. Version 1.8.0+? They switched to ["metal" images](https://www.talos.dev/v1.10/advanced/metal-network-configuration/) that don't.
50+
51+
Why does this matter? If you want static IPs through Terraform, you need [cloud-init support](https://www.talos.dev/v1.7/talos-guides/install/cloud-platforms/). Without it, you're stuck doing manual network config. So much for automation.
52+
53+
Sticking with 1.7.6:
54+
55+
```hcl
56+
variable "talos_version" {
57+
description = "Talos Linux version to use"
58+
type = string
59+
default = "1.7.6" # Latest 1.7.x for nocloud support
60+
}
61+
```
62+
*[View in repo](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/variables.tf#L21-L25)*
63+
64+
There's probably a better way with newer versions. Haven't found it.
65+
66+
## The Template Approach
67+
68+
I split my Terraform into modules. The template module creates a reusable VM template:
69+
70+
```hcl
71+
module "talos_template" {
72+
source = "../../modules/talos-template"
73+
74+
template_vm_id = 9200
75+
talos_version = "1.7.6"
76+
proxmox_node = "proxmox"
77+
vm_storage_pool = "local-lvm" # Adjust for your Proxmox storage
78+
}
79+
```
80+
*[View template module](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox/modules/talos-template)*
81+
82+
The node module clones from the template:
83+
84+
```hcl
85+
module "control_plane" {
86+
source = "../../modules/talos-node"
87+
88+
template_vm_id = module.talos_template.template_id
89+
node_name = "homelab-cp"
90+
node_role = "controlplane"
91+
vm_id = 200
92+
93+
# Static IP configuration
94+
ip_address = "10.0.20.10"
95+
subnet_mask = 24
96+
gateway = "10.0.20.1"
97+
98+
# Required Talos configuration
99+
cluster_name = var.cluster_name
100+
talos_client_config = talos_machine_secrets.cluster.client_configuration
101+
machine_config = data.talos_machine_configuration.control_plane.machine_configuration
102+
}
103+
```
104+
*[View node module](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox/modules/talos-node)*
105+
106+
Works reliably enough.
107+
108+
## The File Upload Problem
109+
110+
The Talos images are 1.2GB compressed. Uploading through the Proxmox API? Timeouts. Failures. Pain.
111+
112+
I ended up using SSH and rsync:
113+
114+
```bash
115+
# Download and decompress locally
116+
curl -fsSL "https://github.com/siderolabs/talos/releases/download/v${var.talos_version}/nocloud-amd64.raw.xz" \
117+
-o "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz"
118+
xz -d "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz"
119+
120+
# Upload via rsync with progress
121+
rsync -avz --progress "/tmp/talos-${var.talos_version}-nocloud-amd64.raw" \
122+
root@${var.proxmox_node}:/tmp/
123+
```
124+
*[View full upload logic](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/main.tf#L58-L127)*
125+
126+
Not pretty but it works.
127+
128+
## Getting kubectl and talosctl Working
129+
130+
Got the Terraform to automatically configure [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [talosctl](https://www.talos.dev/v1.10/learn-more/talosctl/). It merges the new cluster config with my existing kubeconfig:
131+
132+
```hcl
133+
resource "null_resource" "kubeconfig_merge" {
134+
provisioner "local-exec" {
135+
command = <<-EOT
136+
# Backup existing config (learned this one the hard way)
137+
cp ~/.kube/config ~/.kube/config.backup.$(date +%Y%m%d_%H%M%S)
138+
139+
# Merge configs
140+
KUBECONFIG="$HOME/.kube/config:${path.root}/kubeconfig" \
141+
kubectl config view --flatten > /tmp/kubeconfig.merged
142+
143+
# Validate merge worked, then replace
144+
if kubectl --kubeconfig=/tmp/kubeconfig.merged config get-contexts | grep -q "${var.cluster_name}"; then
145+
mv /tmp/kubeconfig.merged ~/.kube/config
146+
fi
147+
EOT
148+
}
149+
}
150+
```
151+
*[View full kubeconfig merge logic](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/environments/dev/main.tf#L295-L388)*
152+
153+
That backup step? Added after I nuked my kubeconfig. Don't skip it.
154+
155+
## What's Running on It Now
156+
157+
Got a pretty full stack running now:
158+
159+
- **[MetalLB](https://metallb.universe.tf/)** for load balancer services (bare metal needs help with LoadBalancer types)
160+
- **[Prometheus](https://prometheus.io/)** and **[Grafana](https://grafana.com/)** for monitoring (because you can't manage what you can't see)
161+
- **[cert-manager](https://cert-manager.io/)** for SSL certificates (Let's Encrypt automation)
162+
- **[Istio](https://istio.io/)** service mesh (probably overkill but it's cool)
163+
- **[CloudNativePG](https://cloudnative-pg.io/)** for PostgreSQL (way better than managing databases manually)
164+
- **[Local Path Provisioner](https://github.com/rancher/local-path-provisioner)** for storage (just local volumes but it works)
165+
166+
Turns out once you get the base cluster working, adding stuff with Helm is pretty straightforward.
167+
168+
## Performance Tweaks I Found
169+
170+
Made a few tweaks based on random blog posts. Not sure if they help, but they don't hurt:
171+
172+
```hcl
173+
# CPU type for better container performance (supposedly)
174+
cpu_type = "x86-64-v2-AES"
175+
176+
# Disable memory ballooning
177+
balloon = 0
178+
```
179+
*[View performance configs](https://github.com/TechDufus/home.io/blob/main/terraform/proxmox/modules/talos-template/main.tf#L112-L116)*
180+
181+
The memory ballooning thing: Proxmox tries to be clever about memory but it can mess with Kubernetes.
182+
183+
## Things That Take Getting Used To
184+
185+
Talos does things differently than traditional Linux:
186+
187+
- No SSH access feels wrong at first. Yeah, security, but debugging requires learning new approaches
188+
- [Certificate stuff](https://www.talos.dev/v1.10/kubernetes-guides/configuration/certificate-rotation/) just... happens automatically. Nice but takes trust
189+
- Immutable filesystem is cool in theory but changes how you think about troubleshooting
190+
191+
## What I'd Do Differently
192+
193+
Next time:
194+
195+
- Start with one node to understand Talos first
196+
- Actually read the docs before diving in
197+
- Maybe try [k3s](https://k3s.io/) first since it's simpler
198+
- Set up monitoring from the start
199+
200+
## Is This Overkill for a Homelab?
201+
202+
Absolutely. Could do most of this with [k3s](https://k3s.io/) or [Docker Compose](https://docs.docker.com/compose/). But where's the fun in that?
203+
204+
## Wrapping Up
205+
206+
Getting Talos running with Terraform was more work than expected. Version issues, file upload problems, and Talos complexity made this a multi-weekend project.
207+
208+
But it works! Can tear down and rebuild with [`terraform apply`](https://developer.hashicorp.com/terraform/cli/commands/apply). Not perfect, probably better ways to do it, but it's mine.
209+
210+
If you try something similar:
211+
- Start simple
212+
- Talos [1.7.x vs 1.8.x](https://www.talos.dev/v1.8/introduction/changelog-1.8/) matters for cloud-init
213+
- Back up your kubeconfig
214+
- Expect frustration
215+
216+
## Helpful Resources
217+
218+
Docs that actually helped:
219+
220+
- [Talos Linux Documentation](https://www.talos.dev/docs/) - Pretty good once you get the basics
221+
- [Proxmox Provider Docs](https://registry.terraform.io/providers/bpg/proxmox/latest/docs) - Essential
222+
- [Talos Provider Docs](https://registry.terraform.io/providers/siderolabs/talos/latest/docs) - Sparse but necessary
223+
- [Complete Terraform Code](https://github.com/TechDufus/home.io/tree/main/terraform/proxmox) - The full working implementation
224+
225+
If you build something similar, let me know what worked. Still figuring this out.
226+

tags/alos.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: tag
3+
section-type: tag
4+
title: alos
5+
---
6+
## Tag

tags/erraform.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: tag
3+
section-type: tag
4+
title: erraform
5+
---
6+
## Tag

tags/homelab.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: tag
3+
section-type: tag
4+
title: homelab
5+
---
6+
## Tag

tags/proxmox.html

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: tag
3+
section-type: tag
4+
title: proxmox
5+
---
6+
## Tag

0 commit comments

Comments
 (0)