Description
openedon Apr 20, 2024
Before creating an issue, make sure you've checked the following:
- You are running the latest released version of k0s
- Make sure you've searched for existing issues, both open and closed
- Make sure you've searched for PRs too, a fix might've been merged already
- You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.
Platform
Linux 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Version
v1.29.3+k0s.0
Sysinfo
`k0s sysinfo`
Machine ID: "bf8222b2f95c64426d710b2c00cc6e5eb49618a69a861782356308a69f99328c" (from machine) (pass) Total memory: 15.3 GiB (pass) Disk space available for /var/lib/k0s: 13.5 GiB (pass) Name resolution: localhost: [::1 127.0.0.1] (pass) Operating system: Linux (pass) Linux kernel release: 6.1.0-18-amd64 (pass) Max. file descriptors per process: current: 1048576 / max: 1048576 (pass) AppArmor: active (pass) Executable in PATH: modprobe: exec: "modprobe": executable file not found in $PATH (warning) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (is a listed root controller) (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (is a listed root controller) (pass) cgroup controller "memory": available (is a listed root controller) (pass) cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions) cgroup controller "freezer": available (cgroup.freeze exists) (pass) cgroup controller "pids": available (is a listed root controller) (pass) cgroup controller "hugetlb": available (is a listed root controller) (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: built-in (pass) CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass) CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass) CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass) CONFIG_CPUSETS: Cpuset support: built-in (pass) CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass) CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass) CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass) CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass) CONFIG_BLK_CGROUP: Block IO controller: built-in (pass) CONFIG_NAMESPACES: Namespaces support: built-in (pass) CONFIG_UTS_NS: UTS namespace: built-in (pass) CONFIG_IPC_NS: IPC namespace: built-in (pass) CONFIG_PID_NS: PID namespace: built-in (pass) CONFIG_NET_NS: Network namespace: built-in (pass) CONFIG_NET: Networking support: built-in (pass) CONFIG_INET: TCP/IP networking: built-in (pass) CONFIG_IPV6: The IPv6 protocol: built-in (pass) CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass) CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass) CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass) CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass) CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass) CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass) CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass) CONFIG_NETFILTER_XT_SET: set target and match support: module (pass) CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass) CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass) CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass) CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass) CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass) CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass) CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass) CONFIG_NETFILTER_NETLINK: module (pass) CONFIG_NF_NAT: module (pass) CONFIG_IP_SET: IP set support: module (pass) CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass) CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass) CONFIG_IP_VS: IP virtual server support: module (pass) CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass) CONFIG_IP_VS_SH: Source hashing scheduling: module (pass) CONFIG_IP_VS_RR: Round-robin scheduling: module (pass) CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass) CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass) CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning) CONFIG_IP_NF_IPTABLES: IP tables support: module (pass) CONFIG_IP_NF_FILTER: Packet filtering: module (pass) CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass) CONFIG_IP_NF_NAT: iptables NAT support: module (pass) CONFIG_IP_NF_MANGLE: Packet mangling: module (pass) CONFIG_NF_DEFRAG_IPV4: module (pass) CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning) CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass) CONFIG_IP6_NF_FILTER: Packet filtering: module (pass) CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass) CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass) CONFIG_NF_DEFRAG_IPV6: module (pass) CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass) CONFIG_LLC: module (pass) CONFIG_STP: module (pass) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass) CONFIG_PROC_FS: /proc file system support: built-in (pass)
What happened?
k0s reset
, followed by a node reboot deleted all files from all persistent volumes, irrespective of their Retain
policies. Folders remained, but were completely empty.
Steps to reproduce
I have not managed to reproduce the error (thankfully!)
Expected behavior
Persistent volumes mounted with the Retain
policy are untouched on a reset, only k0s' /var/lib/k0s
directory gets cleaned.
Actual behavior
See "What Happened?"
Screenshots and logs
I have the full set of logs from sudo journalctl -u k0scontroller -r -U 2024-04-13 -S 2024-04-11
but I imagine a more focussed subset is more useful!
Additional context
Firstly, thanks for the great tool!
I have previously run k0s reset
a fair few times without issue with no changes to the way volumes were mounted or to the services running in the cluster. All I can think that seperates this one from the others is the context of the reset regarding why it was needed:
This specific reset was prompted by an issue with the helm extensions: removal of a chart from the k0s config yaml, instead of uninstalling the chart, put the cluster in an unstartable state. The config was installed into the controller by
$ sudo k0s install controller --single -c ~/hs-infra/k0s.yaml
And k0s stop
was run before making changes to the config. The only changes to k0s.yaml
were from
extensions:
helm:
repositories:
- name: tailscale
url: https://pkgs.tailscale.com/helmcharts
charts:
- name: tailscale-operator
namespace: tailscale
chartname: tailscale/tailscale-operator
- name: nginx-gateway
namespace: nginx-gateway
chartname: oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric
to
extensions:
helm:
repositories: null
charts: null
k0s start
would then error with logs such as
time="2024-04-12 11:58:15" level=info msg="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" Chart="{k0s-addon-chart-nginx-gateway kube-system}" component=extensions_controller controller=chart controllerGroup=helm.k0sproject.io controllerKind=Chart name=k0s-addon-chart-nginx-gateway namespace=kube-system
or
time="2024-04-12 10:58:36" level=info msg="Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler" Chart="{k0s-addon-chart-tailscale-operator kube-system}" component=extensions_controller controller=chart controllerGroup=helm.k0sproject.io controllerKind=Chart name=k0s-addon-chart-tailscale-operator namespace=kube-system
Before running a k0s reset, to try to resolve the error and start the cluster successfully, I modified /var/lib/k0s/helmhome/repositories.yaml
to remove reference to the charts, but this didn't work either. Therefore I ran k0s reset
as I had a few times before, and performed the node reboot as requested. However on restarting the server, all files in any mounted volumes were deleted - definitely deleted and not just hidden or moved as an inspection of available space revealed. Strangely enough the folder structure within the volumes remained, but just all empty directories.
If it helps, here's some manifest snippets!
Persistent Volume manifest example
apiVersion: v1
kind: PersistentVolume
metadata:
name: media-pv
labels:
type: local
disk: hdd
spec:
storageClassName: manual
capacity:
storage: 4T
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
path: "/mnt/sda1/media"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: appdata-pv
labels:
type: local
disc: ssd
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
path: "/home/<user>/appdata"
Persistent Volume Claims manifest example
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: media-pvc
namespace: default
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4T
volumeName: media-pv
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: appdata-pvc
namespace: default
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: appdata-pv
Volume mount in deployment example
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: Recreate
selector:
matchLabels:
app: plex
template:
metadata:
labels:
app: plex
spec:
hostNetwork: True
volumes:
- name: plex-vol
persistentVolumeClaim:
claimName: appdata-pvc
- name: media-vol
persistentVolumeClaim:
claimName: media-pvc
...
volumeMounts:
- name: plex-vol
mountPath: /config
subPath: plex/config
- name: media-vol
mountPath: /media