Skip to content

Nomad fingerprinting thinks Termina doesn't support bridge, but it does #10902

Open
@insanitybit

Description

Nomad version

Output from nomad version
❯ nomad --version
Nomad v1.1.2 (60638a0)

Operating system and Environment details

❯ uname -a
Linux penguin 5.4.109-26094-g381754fbb430 #1 SMP PREEMPT Sat Jun 26 21:31:00 PDT 2021 x86_64 GNU/Linux

❯ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster

Issue

ChromeOS provides a Linux VM that runs the Termina operating system, a stripped down, hardened Linux environment. Termina is perfectly capable of running bridge networks, but Nomad fails to detect this with its fingerprint heuristic.

Reproduction steps

sudo nomad agent -dev-connect &
sudo nomad job run testjob

Expected Result

Nomad schedules the job appropriately.

Actual Result

Nomad is unable to schedule the job because it believes that the agents are running on nodes that don't support bridge networks.

Job file (if appropriate)

job "test" {
    datacenters = ["dc1"]
    type = "service"
    group "foo" {
        network { 
            mode = "bridge"
        }
        task "test-task" {
            driver = "docker"
            config {
               image = "dgraph/dgraph:latest"
               args = ["dgraph", "zero", "--my=localhost:5080"]
            }
        }
    }
}

Nomad Server logs (if appropriate)

    2021-07-14T15:38:08.374-0700 [WARN]  client.fingerprint_mgr: failed to detect bridge kernel module, bridge network mode disabled: error="3 errors occurred:
	* failed to open /proc/modules: open /proc/modules: no such file or directory
	* failed to open /lib/modules/5.4.109-26094-g381754fbb430/modules.builtin: open /lib/modules/5.4.109-26094-g381754fbb430/modules.builtin: no such file or directory
	* failed to open /lib/modules/5.4.109-26094-g381754fbb430/modules.dep: open /lib/modules/5.4.109-26094-g381754fbb430/modules.dep: no such file or directory

Notes

There's a simple enough workaround:

sudo mkdir -p /lib/modules/$(uname -r)/
sudo echo '_/bridge.ko' > /lib/modules/$(uname -r)/modules.builtin

This "tricks" nomad into thinking that there's a ko file registered for bridge networking, so it doesn't bail from fingerprinting. After this you can use my repro steps and see that it's then able to schedule the job just fine, it'll even be healthy after just a bit.

func (f *BridgeFingerprint) Fingerprint(req *FingerprintRequest, resp *FingerprintResponse) error {

This function here is the culprit. It assumes that if the kernel supports bridge networking it must have a module somewhere, but that's not the case. Termina doesn't have a /proc/modules, nor a /lib/modules/. Termina does not support loading kernel modules at all, in fact.

I don't know Nomad's internals intimately, but my suggestion is to not do fingerprinting for this sort of thing. Just try to create a bridge network and if it works it works and if it doesn't it doesn't. That's the easiest way to know that support exists. Or if there's some way for me to tell nomad "no for real, we support it, ignore your fingerprint".

Alternatively, it's certainly a hack, but you could use uname -n and see if it's "penguin", which would solve this very specific instance.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions