-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore cluster with embedded etcd datastore snapshot failed #5334
Comments
Did you see the instructions at the end of the log?
|
how to restart without --cluster-reset flag? In the first master node ,execute “k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-serverXXXX”, and then execute “systemctl restart k3s”, k3s server is not running and inactive. cat /etc/systemd/system/k3s.service [Install] [Service] |
If you stopped the K3s service and ran the cluster-reset command from your shell, all you should need to do is start the k3s service again? Did you somehow run the cluster-reset command with the k3s service still running? |
After ran the cluster-reset command, Istart k3s service again, but k3s service can not run , always in activating (start), logs: ● k3s.service - Lightweight Kubernetes 3月 28 09:49:40 server-VirtualBox k3s[19468]: time="2022-03-28T09:49:40+08:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect" |
Did you change the VM's hostname or IP address at some point? The log line is truncated here but the logs indicate that the node's current name/IP cannot be found in the cluster. If you included the whole log line here it should be more clear what the node's current name/IP are.
|
thanks, my vm has multiple networking-interfaces.I exec cluster-reset --node-ip , restore success. In addition,I exec etcdctl restore snapshots command, also can restore success. |
tbh i think the documentation is misleading. after losing about 2 hours time finding for a solution, i have figured out that i had to remove not just the data_dir/server/db folder, but the whole data_dir/server directory ... because otherwise k3s will ignore the --server option and try to recreate a sqlite database. is this normal ? note: i faced this problem so i had to restore the cluster: etcd-io/etcd#13766 |
No, that's not right. All you should need to delete is the db dir. If there are existing etcd files on disk, it will ignore the |
Environmental info:
K3s Version:v1.22.7+k3s1
K3s Cluster:
NAME STATUS ROLES AGE VERSION
server-virtualbox Ready control-plane,etcd,master 21m v1.22.7+k3s1
server2-virtualbox Ready control-plane,etcd,master 17m v1.22.7+k3s1
server3-virtualbox Ready control-plane,etcd,master 13m v1.22.7+k3s1
Rsetore Snapshot:
systemctl stop k3s
k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-serverXXXX
systemctl stop k3s
rm -rf /var/lib/rancher/k3s/db
systemctl start k3s
Expected behavior:
cluster is healthy
Describe bug:
the first master ,ks3 server not active running,logs:
INFO[0011] Failed to set etcd role label: failed to register CRDs: Get "https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1/customresourcedefinitions": dial tcp 127.0.0.1:6444: connect: connection refused
INFO[0012] etcd data store connection OK
INFO[0012] ETCD server is now running
INFO[0012] k3s is up and running
WARN[0012] failed to unmarshal etcd key: unexpected end of JSON input
WARN[0012] bootstrap key already exists
INFO[0012] Reconciling etcd snapshot data in k3s-etcd-snapshots ConfigMap
INFO[0012] Reconciling bootstrap data between datastore and disk
INFO[0012] Cluster reset: backing up certificates directory to /var/lib/rancher/k3s/server/tls-1648174319
INFO[0012] Etcd is running, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes
and other master server not running,how can restore healthy cluster with embedded etcd datastore snapshot correctly?
The text was updated successfully, but these errors were encountered: