Skip to content

[zosv3light] dhcp-zos losing carrier on deployment #2531

Open
@coesensbert

Description

@coesensbert

Describe the bug

zosv3light node 7403 running on a hetzner dedicated server was running fine until I deployed a workload via terraform. A full vm with mycelium and ygg enabled.
saw this on 2 other nodes as well, same exact behavior. Once deployed the vm works over mycelium for a few minutes and then becomes unreachable, however the issue does not seem to be related to mycelium. In loki one can find that dhcp-zos lost it's carrier, and therefore removed it's default routes etc ..
Therefore it becomes impossible to remove the workload via terraform since the node is unreachable. I have to reboot the zos node and then remove the deployment. If after a reboot one waits a few minutes, the same pattern repeats until the deployment is removed. Deploying the same terraform on other nodes does not create this issue.

To Reproduce

Steps to reproduce the behavior:

1. deploy below terraform on mainnet node 7403
2. watch the loki logs: https://mon.grid.tf/explore?orgId=1&left=%7B%22datasource%22:%22Loki-ZOS%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bfarm%3D%5C%223997%5C%22,network%3D%5C%22production%5C%22,node%3D%5C%225C8DMBKpg88NM1XRS91ET9BYo26bo1YoVPyMQMeXRhijyFVm%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%22now-5m%22,%22to%22:%22now%22%7D%7D
3. watch the metrics stop: https://metrics.grid.tf/d/rYdddlPWkfqwf/zos-host-metrics?orgId=2&refresh=30s&var-network=production&var-farm=3997&var-node=5C8DMBKpg88NM1XRS91ET9BYo26bo1YoVPyMQMeXRhijyFVm&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=now-1h&to=now&timezone=browser
4. try to reach your deployment and test if it stays online

Expected behavior

normal node / deployment operation

Screenshots

Loki logs from my last deployment: https://mon.grid.tf/explore?orgId=1&left=%7B%22datasource%22:%22Loki-ZOS%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bfarm%3D%5C%223997%5C%22,network%3D%5C%22production%5C%22,node%3D%5C%225C8DMBKpg88NM1XRS91ET9BYo26bo1YoVPyMQMeXRhijyFVm%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%221738754325781%22,%22to%22:%221738754987799%22%7D%7D

node metrics:
https://metrics.grid.tf/d/rYdddlPWkfqwf/zos-host-metrics?orgId=2&refresh=30s&var-network=production&var-farm=3997&var-node=5C8DMBKpg88NM1XRS91ET9BYo26bo1YoVPyMQMeXRhijyFVm&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=2025-02-05T10:57:50.069Z&to=2025-02-05T11:34:24.989Z&timezone=browser

terraform:

terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

resource "random_bytes" "mycelium_ip_seed" {
  length = 6
}

resource "random_bytes" "mycelium_key" {
  length = 32
}

resource "grid_network" "net1" {
    nodes = [7403]
    ip_range = "10.212.0.0/16"
    name = "myceiperf2"
    description = "myceiperf2"
    add_wg_access = true
    mycelium_keys = {
      format("%s", 7403) = random_bytes.mycelium_key.hex
    }
}
resource "grid_deployment" "d1" {
  node = 7403
  network_name = grid_network.net1.name
  disks {
    name = "root"
    size = 25
  }
    vms {
    name = "myceiperf2"
    flist = "https://hub.grid.tf/tf-official-vms/ubuntu-24.04-latest.flist"
    cpu = 4
    planetary = true
    publicip = false
    publicip6 = false
    memory = 8192
#    entrypoint = "/sbin/zinit init"
    mounts {
        name = "root"
        mount_point = "/data"
    }
    env_vars = {
      SSH_KEY ="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDYNeJXJV2FNEwuQz6e0jkKeqRbKWwftBKq+sjSTqa2x"
    }
    mycelium_ip_seed = random_bytes.mycelium_ip_seed.hex
  }
}
output "wg_config" {
    value = grid_network.net1.access_wg_config
}
output "node1_vm1_ip" {
    value = grid_deployment.d1.vms[0].ip
}
output "public_ip" {
    value = grid_deployment.d1.vms[0].computedip
}
output "public_ip6" {
    value = grid_deployment.d1.vms[0].computedip6
}
output "planetary_ip" {
    value = grid_deployment.d1.vms[0].planetary_ip
}
output "vm1_mycelium_ip" {
  value = grid_deployment.d1.vms[0].mycelium_ip
}

Metadata

Metadata

Labels

type_bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions