You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would be the most likely cause for my node failing to be registered as ready, even though it's actually ready in my cluster (confirmed)?
msg="Join command for work-7bd10622-9f90-4c84-9669-1ac05187409d executed successfully" host=kproximate-worker-557b6bccbb-gbgz2 component=worker
msg="Waiting for work-7bd10622-9f90-4c84-9669-1ac05187409d to join kubernetes cluster" host=kproximate-worker-557b6bccbb-gbgz2 component=worker
msg="Scale up event failed" host=kproximate-worker-557b6bccbb-gbgz2 component=worker error="timed out waiting for work-7bd10622-9f90-4c84-9669-1ac05187409d to join kubernetes cluster"
Manually joining a node with the join command works perfectly fine.
I tried increasing both wait parameters and sifted through the codebase to find the potential issue, but haven't been able to pinpoint the likely cause of it.
Curious to hear your thoughts on the matter.
~ Reference template ~
My node template in OpenTofu (confirmed to work over manual join command execution):
resource "proxmox_virtual_environment_file" "cloud_init_template" {
node_name = "pve1"
datastore_id = "cephfs"
content_type = "snippets"
source_raw {
file_name = "cloud-init-k3s-template.yaml"
data = templatefile("k3s/machine-config/k8s-worker.yaml.tftpl", {
hostname = "k3s-template"
username = var.vm_user
password = var.vm_password
pub-key = var.pub_key
})
}
lifecycle {
prevent_destroy = true
}
}
resource "proxmox_virtual_environment_vm" "k3s_template" {
depends_on = [proxmox_virtual_environment_download_file.debian_12_generic_image]
node_name = "pve1"
vm_id = var.worker_template_id
name = "k3s-worker-template"
machine = "q35"
scsi_hardware = "virtio-scsi-single"
bios = "ovmf"
on_boot = false
started = false
agent {
enabled = true
}
cpu {
cores = var.worker_template_cpu
type = "host"
}
memory {
dedicated = var.worker_template_memory
}
network_device {
bridge = "vmbr0"
}
disk {
datastore_id = "vm-disks"
file_id = proxmox_virtual_environment_download_file.debian_12_generic_image.id
interface = "scsi0"
cache = "writethrough"
discard = "on"
ssd = true
size = var.worker_template_disk_size
iothread = true
}
boot_order = ["scsi0"]
operating_system {
type = "l26"
}
initialization {
datastore_id = "vm-disks"
user_data_file_id = proxmox_virtual_environment_file.cloud_init_template.id
dns {
domain = "."
servers = ["1.1.1.1", "8.8.8.8"]
}
ip_config {
ipv4 {
address = "dhcp"
}
}
}
}
resource "null_resource" "convert_to_template" {
depends_on = [proxmox_virtual_environment_vm.k3s_template]
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<EOT
echo "Starting VM..."
curl --insecure --fail \
-X POST \
-H "Authorization: PVEAPIToken=root@pam!key={{key}}" \
"${var.proxmox.endpoint}/api2/json/nodes/pve1/qemu/${proxmox_virtual_environment_vm.k3s_template.vm_id}/status/start"
echo "Waiting for VM to start..."
sleep 30
echo "Waiting for QEMU agent..."
while true; do
echo "Checking QEMU agent status..."
status=$(curl --insecure --fail -s \
-H "Authorization: PVEAPIToken=root@pam!key={{key}}" \
"${var.proxmox.endpoint}/api2/json/nodes/pve1/qemu/${proxmox_virtual_environment_vm.k3s_template.vm_id}/agent/get-fsinfo")
if [ -n "$status" ] && [ "$status" != "not-ready" ]; then
echo "QEMU agent is available - proceeding with template conversion"
break
fi
echo "QEMU agent not ready, waiting 10 seconds..."
sleep 10
done
echo "Stopping VM..."
curl --insecure --fail \
-X POST \
-H "Authorization: PVEAPIToken=root@pam!key={{key}}" \
"${var.proxmox.endpoint}/api2/json/nodes/pve1/qemu/${proxmox_virtual_environment_vm.k3s_template.vm_id}/status/stop"
echo "Waiting for VM to stop..."
sleep 20
echo "Converting to template..."
curl --insecure --fail \
-X POST \
-H "Authorization: PVEAPIToken=root@pam!key={{key}}" \
"${var.proxmox.endpoint}/api2/json/nodes/pve1/qemu/${proxmox_virtual_environment_vm.k3s_template.vm_id}/template"
if [ $? -eq 0 ]; then
echo "Successfully converted to template"
else
echo "Failed to convert to template"
exit 1
fi
EOT
}
}
What would be the most likely cause for my node failing to be registered as ready, even though it's actually ready in my cluster (confirmed)?
Manually joining a node with the join command works perfectly fine.
I tried increasing both wait parameters and sifted through the codebase to find the potential issue, but haven't been able to pinpoint the likely cause of it.
Curious to hear your thoughts on the matter.
~ Reference template ~
My node template in OpenTofu (confirmed to work over manual join command execution):
With the following cloud-init:
The text was updated successfully, but these errors were encountered: