With node-drain-policy
is block-if-contains-last-replica
+++Note: +Starting from v1.5.x, it is not necessary to check for the presence of longhorn-admission-webhook and longhorn-conversion-webhook. +Please refer to the Longhorn issue #5590 for more details.
+Starting from v1.5.x, observe that the instance-manager-r and instance-manager-e are combined into instance-manager. +Ref 5208
+
1. Basic unit tests
+1.1 Single worker node cluster with separate master node
+1.1.1 RWO volumes
+-
+
- Deploy Longhorn +
- Verify that there is no PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Manually create a PVC (simulate the volume which has never been attached scenario) +
- Verify that there is no PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
because there is no attached volume
+ - Create a deployment that uses one RW0 Longhorn volume. +
- Verify that there is PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Create another deployment that uses one RWO Longhorn volume. Scale down this deployment so that the volume is detached +
- Drain the node by
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
+ - Observe that the workload pods are evited first -> PDB of
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
are removed ->csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
, and instance-manager-e pods are evicted -> all volumes are successfully detached
+ - Observe that instance-manager-r is NOT evicted. +
1.1.2 RWX volume
+-
+
- Deploy Longhorn +
- Verify that there is no PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Create a deployment of 2 pods that uses one RWX Longhorn volume. +
- Verify that there is PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Drain the node by
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
+ - Observe that the workload pods are evited first -> PDB of
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
are removed ->csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
, and instance-manager-e pods are evicted -> all volumes are successfully detached
+ - Observe that instance-manager-r is NOT evicted. +
1.2 multi-node cluster
+1.2.1 Multiple healthy replicas
+-
+
- Deploy Longhorn +
- Verify that there is no PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Manually create a PVC (simulate the volume which has never been attached scenario) +
- Verify that there is no PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
because there is no attached volume
+ - Create a deployment that uses one RW0 Longhorn volume. +
- Verify that there is PDB for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
+ - Create another deployment that uses one RWO Longhorn volume. Scale down this deployment so that the volume is detached +
- Create a deployment of 2 pods that uses one RWX Longhorn volume. +
- For each node one by one by
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
+ - Verify that the drain can finish successfully +
- Uncordon the node and move to next node +
1.2.2 Single healthy replicas
+-
+
- Given Longhorn with 2 nodes cluster: node-1, node-2 +
- Create a 5Gi volume with 1 replica. Let’s say the replica is on node-2 +
- Attached the volume to node-1 +
- Set
node-drain-policy
toblock-if-contains-last-replica
+ - Attempts to drain node-2 that contains the only replica. +
- The node-2 becomes cordoned. +
- All pods on node-2 are evicted except the replica instance manager pod. +
- The message like below keeps appearing.
+
evicting pod longhorn-system/instance-manager-r-xxxxxxxx +error when evicting pods/"instance-manager-r-xxxxxxxx" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. +
+
2. Upgrade Kubernetes for k3s cluster with standalone System Upgrade Controller deployment
+-
+
- Deploy a 3 nodes with each node has all roles (master + worker) +
- Install the System Upgrade Controller +
- Deploy Longhorn +
- Manually create a PVC (simulate the volume which has never been attached scenario) +
- Create a deployment that uses one RW0 Longhorn volume. +
- Create another deployment that uses one RWO Longhorn volume. Scale down this deployment so that the volume is detached +
- Create another deployment of 2 pods that uses one RWX Longhorn volume. +
- Deploying the
plan
CR to upgrade Kubernetes similar to:
+
apiVersion: upgrade.cattle.io/v1
+ kind: Plan
+ metadata:
+ name: k3s-server
+ namespace: system-upgrade
+ spec:
+ concurrency: 1
+ cordon: true
+ nodeSelector:
+ matchExpressions:
+ - key: node-role.kubernetes.io/master
+ operator: In
+ values:
+ - "true"
+ serviceAccountName: system-upgrade
+ drain:
+ force: true
+ skipWaitForDeleteTimeout: 60 # 1.18+ (honor pod disruption budgets up to 60 seconds per pod then moves on)
+ upgrade:
+ image: rancher/k3s-upgrade
+ version: v1.21.11+k3s1
+
Note that the concurrency
should be 1 to upgrade node one by one. version
should be a newer K3s version. And it should contains the drain
stage
-
+
- Verify that the upgrade went smoothly +
- Exec into workload pod and make sure that the data is still there +
- Repeat the upgrading process above 5 times to make sure +
3. Upgrade Kubernetes for imported k3s cluster in Rancher
+-
+
- Creating a 3-node k3s cluster with each node is both master+worker role. K3s should be an old version such as
v1.21.9+k3s1
so that we can upgrade multiple times. Some instructions to create such cluster is here https://docs.k3s.io/datastore/ha-embedded
+ - Import the cluster into Rancher by: go to cluster management -> create new cluster -> generic cluster -> follow the instruction over there +
- Update the upgrade strategy in cluster management -> click three dots menu on the imported cluster -> edit config -> K3s options -> close drain for both control plane and worker node like below: + +
- Install Longhorn +
- Manually create a PVC (simulate the volume which has never been attached scenario) +
- Create a deployment that uses one RW0 Longhorn volume. +
- Create another deployment that uses one RWO Longhorn volume. Scale down this deployment so that the volume is detached +
- Create another deployment of 2 pods that uses one RWX Longhorn volume. +
- Using Rancher to upgrade the cluster to a newer Kubernetes version +
- Verify that the upgrade went smoothly +
- Exec into workload pod and make sure that the data is still there +
4. Upgrade Kubernetes for provisioned k3s cluster in Rancher
+-
+
- Using Rancher to provision a k3s cluster with an old version. For example,
v1.22.11+k3s2
. The cluster has 3 nodes each node with both worker and master role. Set the upgrade strategy as below: +
+ - Install Longhorn +
- Manually create a PVC (simulate the volume which has never been attached scenario) +
- Create a deployment that uses one RW0 Longhorn volume. +
- Create another deployment that uses one RWO Longhorn volume. Scale down this deployment so that the volume is detached +
- Create another deployment of 2 pods that uses one RWX Longhorn volume. +
- Using Rancher to upgrade the cluster to a newer Kubernetes version +
- Verify that the upgrade went smoothly +
- Exec into workload pod and make sure that the data is still there +
With node-drain-policy
is allow-if-replica-is-stopped
+-
+
- Repeat the test cases above. +
- Verify that in the test
1.1.1
,1.1.2
,1.2.1
,2
,3
, and4
, the drain is successfully.
+ - Verify that the test
1.2.2
, the drain is still failed
+
With node-drain-policy
as always-allow
+-
+
- Repeat the test cases above. +
- Verify that in the test
1.1.1
,1.1.2
,1.2.1
,1.2.2
,2
,3
, and4
, the drain is successfully.
+