Description
openedon Apr 12, 2024
Before creating an issue, make sure you've checked the following:
- You are running the latest released version of k0s
- Make sure you've searched for existing issues, both open and closed
- Make sure you've searched for PRs too, a fix might've been merged already
- You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.
Platform
Linux 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Version
v1.28.7+k0s.0
Sysinfo
`k0s sysinfo`
Machine ID: "afebc983fd329da739962030512903dcb8d95d75363811f488798f5677c802ff" (from machine) (pass) Total memory: 15.6 GiB (pass) Disk space available for /var/lib/k0s: 173.3 GiB (pass) Name resolution: localhost: [::1 127.0.0.1] (pass) Operating system: Linux (pass) Linux kernel release: 6.1.0-18-cloud-amd64 (pass) Max. file descriptors per process: current: 1048576 / max: 1048576 (pass) AppArmor: active (pass) Executable in PATH: modprobe: /usr/sbin/modprobe (pass) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (pass) cgroup controller "memory": available (pass) cgroup controller "devices": available (assumed) (pass) cgroup controller "freezer": available (assumed) (pass) cgroup controller "pids": available (pass) cgroup controller "hugetlb": available (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: built-in (pass) CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass) CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass) CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass) CONFIG_CPUSETS: Cpuset support: built-in (pass) CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass) CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass) CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass) CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass) CONFIG_BLK_CGROUP: Block IO controller: built-in (pass) CONFIG_NAMESPACES: Namespaces support: built-in (pass) CONFIG_UTS_NS: UTS namespace: built-in (pass) CONFIG_IPC_NS: IPC namespace: built-in (pass) CONFIG_PID_NS: PID namespace: built-in (pass) CONFIG_NET_NS: Network namespace: built-in (pass) CONFIG_NET: Networking support: built-in (pass) CONFIG_INET: TCP/IP networking: built-in (pass) CONFIG_IPV6: The IPv6 protocol: built-in (pass) CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass) CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass) CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass) CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass) CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass) CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass) CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass) CONFIG_NETFILTER_XT_SET: set target and match support: module (pass) CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass) CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass) CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass) CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass) CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass) CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass) CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass) CONFIG_NETFILTER_NETLINK: module (pass) CONFIG_NF_NAT: module (pass) CONFIG_IP_SET: IP set support: module (pass) CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass) CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass) CONFIG_IP_VS: IP virtual server support: module (pass) CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass) CONFIG_IP_VS_SH: Source hashing scheduling: module (pass) CONFIG_IP_VS_RR: Round-robin scheduling: module (pass) CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass) CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass) CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning) CONFIG_IP_NF_IPTABLES: IP tables support: module (pass) CONFIG_IP_NF_FILTER: Packet filtering: module (pass) CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass) CONFIG_IP_NF_NAT: iptables NAT support: module (pass) CONFIG_IP_NF_MANGLE: Packet mangling: module (pass) CONFIG_NF_DEFRAG_IPV4: module (pass) CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning) CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass) CONFIG_IP6_NF_FILTER: Packet filtering: module (pass) CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass) CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass) CONFIG_NF_DEFRAG_IPV6: module (pass) CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass) CONFIG_LLC: module (pass) CONFIG_STP: module (pass) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass) CONFIG_PROC_FS: /proc file system support: built-in (pass)
What happened?
When using AirgapUpdate
to pull a new images file for a host, the name of the file was the same as the file currently on the host (images-amd64.tar
). This causes the AirgapUpdate
to fail with a content length error?
Steps to reproduce
- install k0s in airgap mode with an images file named
X.tar
- run an AirgapUpdate plan that references another file named
X.tar
- Observe the plan fail (though it would succeed if the file was named differently)
Expected behavior
The new image file is downloaded and replaces the current image file.
Actual behavior
No new image file is downloaded, instead the autopilot plan fails, with the proximate log line being
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=error msg="Unable to download 'http://127.0.0.1:50000/images/images-amd64.tar': bad content length" component=autopilot controller=Node
Screenshots and logs
From journalctl -u k0scontroller.service
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Adding new status for plan 'AirgapUpdate' (index=0)" component=inithandler controller=plans leadermode=true
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=newplan
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Adding new status for plan 'K0sUpdate' (index=1)" component=inithandler controller=plans leadermode=true
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg=Processing command=k0supdate component=autopilot controller=plans leadermode=true state=newplan
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
state=schedulablewait
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true
schedulablewait
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Workers can be scheduled (controllers done)" command=airgapupdate component=autopilot controller=plans leadermode=true state=
ermode=true
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Requesting plan command transition from 'SchedulableWait' --> 'Schedulable'" component=planstatehandler controller=plans lead
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulable
roller=plans leadermode=true state=schedulable
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Sending signalling to node='laverya-ec-airgap-update.c.replicated-qa.internal'" command=airgapupdate component=autopilot cont
ermode=true
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Requesting plan command transition from 'Schedulable' --> 'SchedulableWait'" component=planstatehandler controller=plans lead
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
state=schedulablewait
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true
te.c.replicated-qa.internal updatetype=airgap
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Found available signaling update request" component=autopilot controller=signal object=Node signalnode=laverya-ec-airgap-upda
update.c.replicated-qa.internal updatetype=airgap
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Updating signaling response to 'Downloading'" component=autopilot controller=signal object=Node signalnode=laverya-ec-airgap-
rue state=schedulablewait
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="No applicable transitions available, requesting retry" command=airgapupdate component=autopilot controller=plans leadermode=t
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Requeuing request due to explicit retry" component=autopilot controller=plans leadermode=true
iler=downloading signalnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Starting download of 'http://127.0.0.1:50000/images/images-amd64.tar'" component=autopilot controller=Node object=Node reconc
object=Node reconciler=downloading signalnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=error msg="Unable to download 'http://127.0.0.1:50000/images/images-amd64.tar': bad content length" component=autopilot controller=Node
lnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Updating signaling response to 'FailedDownload'" component=autopilot controller=Node object=Node reconciler=downloading signa
ply your changes to the latest version and try again" name=laverya-ec-airgap-update.c.replicated-qa.internal namespace= reconcileID="\"f506532b-89cd-452e-bcb1-97e7640131e0\""
oller=node controllerGroup= controllerKind=Node error="failed to update signal node to status 'FailedDownload': Operation cannot be fulfilled on nodes \"laverya-ec-airgap-update.c.replicated-qa.internal\": the object has been modified; please ap
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=error msg="Reconciler error" Node="{\"name\":\"laverya-ec-airgap-update.c.replicated-qa.internal\"}" component=controller-runtime contr
iler=downloading signalnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Starting download of 'http://127.0.0.1:50000/images/images-amd64.tar'" component=autopilot controller=Node object=Node reconc
object=Node reconciler=downloading signalnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=error msg="Unable to download 'http://127.0.0.1:50000/images/images-amd64.tar': bad content length" component=autopilot controller=Node
lnode=laverya-ec-airgap-update.c.replicated-qa.internal
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="Updating signaling response to 'FailedDownload'" component=autopilot controller=Node object=Node reconciler=downloading signa
Apr 12 14:39:26 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:26" level=info msg="current cfg matches existing, not gonna do anything" component=coredns
Apr 12 14:39:31 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:31" level=info msg=Processing command=airgapupdate component=autopilot controller=plans leadermode=true state=schedulablewait
state=schedulablewait
Apr 12 14:39:31 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:31" level=info msg="Reconciling controller/worker signal node statuses" command=airgapupdate component=autopilot controller=plans leadermode=true
on: FailedDownload)" command=airgapupdate component=autopilot controller=plans leadermode=true
Apr 12 14:39:31 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:31" level=info msg="Signal node 'laverya-ec-airgap-update.c.replicated-qa.internal' status changed from 'SignalSent' to 'SignalApplyFailed' (reas
=schedulablewait
Apr 12 14:39:31 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:31" level=info msg="Plan is non-recoverable due to apply failure" command=airgapupdate component=autopilot controller=plans leadermode=true state
ermode=true
Apr 12 14:39:31 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:31" level=info msg="Requesting plan command transition from 'SchedulableWait' --> 'ApplyFailed'" component=planstatehandler controller=plans lead
Apr 12 14:39:36 laverya-ec-airgap-update.c.replicated-qa.internal k0s[3928]: time="2024-04-12 14:39:36" level=info msg="current cfg matches existing, not gonna do anything" component=coredns
The plan yaml:
apiVersion: autopilot.k0sproject.io/v1beta2
kind: Plan
metadata:
annotations:
embedded-cluster.replicated.com/installation-name: "20240412114318"
creationTimestamp: "2024-04-12T14:39:26Z"
generation: 1
name: autopilot
resourceVersion: "47287"
uid: 53fa20d7-d56f-49a0-a0f1-1229412c062f
spec:
commands:
- airgapupdate:
platforms:
linux-amd64:
url: http://127.0.0.1:50000/images/images-amd64.tar
version: v1.28.8+k0s.0
workers:
discovery:
static:
nodes:
- laverya-ec-airgap-update.c.replicated-qa.internal
limits:
concurrent: 1
- k0supdate:
platforms:
linux-amd64:
sha256: 51c9482a558096d99028304fd56afd383e2d87a71963e8457e02210298f5be62
url: http://127.0.0.1:50000/bin/k0s-upgrade
targets:
controllers:
discovery:
static:
nodes:
- laverya-ec-airgap-update.c.replicated-qa.internal
limits:
concurrent: 1
workers:
discovery:
static: {}
limits:
concurrent: 1
version: v1.28.8+k0s.0
id: 34b10fbd-2973-4e38-87a4-2765cf454b92
timestamp: now
status:
commands:
- airgapupdate:
workers:
- lastUpdatedTimestamp: "2024-04-12T14:39:26Z"
name: laverya-ec-airgap-update.c.replicated-qa.internal
state: SignalApplyFailed
id: 0
state: ApplyFailed
- id: 1
k0supdate:
controllers:
- lastUpdatedTimestamp: "2024-04-12T14:39:26Z"
name: laverya-ec-airgap-update.c.replicated-qa.internal
state: SignalPending
state: SchedulableWait
state: ApplyFailed
We run a server (outside of k0s) on localhost in order to serve these files.
Additional context
Our workaround here will be to just change the name of the images file each update, but then we need to handle cleanup too - is there something we should be doing instead? For the k0s
binary we can just always name it k0s-upgrade
and it will be renamed as part of the upgrade process, but that doesn't appear to be the case for this file.
(this is also not the latest version of k0s, as that is a necessity for testing updates... I can trigger the AirgapUpdate plan component on its own if that would be desirable though)