Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ReleaseTesting] Four node upgrade with bonded NICs #1254

Closed
albinsun opened this issue May 6, 2024 · 7 comments
Closed

[ReleaseTesting] Four node upgrade with bonded NICs #1254

albinsun opened this issue May 6, 2024 · 7 comments
Assignees
Milestone

Comments

@albinsun
Copy link
Contributor

albinsun commented May 6, 2024

Description

Content refer to Qase HV-608, test following upgrade paths

Harvester

  1. 1.2.1 -> 1.2.2-rc2 -> 1.3-head

Harvester with Rancher guest cluster

  1. H1.2.1 + R2.7.10 -> H1.2.1 + R2.7.11 -> H1.2.2-rc2 + R2.7.12 -> H1.3-head + R2.8.3

Test Steps

Prerequisites

  1. VLAN 1 network on mgmt and 1 network on bonded NICs
  2. 2 Virtual machines with data and md5sum computed- 1 running, 1 stopped
  3. 2 VM backup, snapshots - 1 backup when VM is running and 1 backup when VM is stooped
  4. Create a new storage class apart from default one. Use the new storage class for some basic operations.
  5. Import to Rancher 2.7.10 and create an RKE2 guest cluster 1.26. provisioned on Harvester VM before the upgrade.
  6. Deploy Harvester cloud provider to RKE1 Cluster (prior to latest version)
  7. Verify DHCP load balancer service
  8. Install Harvester CSI Driver (prior to latest version)
  9. Create a new Harvester PVC for nginx deployment

Upgrade Rancher to v2.7.11

  1. Guest cluster still works

Upgrade Harvester to v1.2.2-rc2

  1. Guest cluster still works
  2. Post-upgrade checks

Upgrade Rancher to v2.7.12

  1. Guest cluster still works

Upgrade Harvester to v1.3-head

  1. Guest cluster still works

Upgrade Rancher to v2.8.3

  1. Guest cluster still works

Post upgrade checks

  1. Dependencies Check
  2. Virtual machines are in same state as before and accessible.
  3. Restore the backups, check the data
  4. Image and volume status
  5. Monitoring chart status
  6. VM operations are highlighted and working fine.
  7. Add a node after the upgrade
  8. Upgrade the RKE2 guest cluster to 1.27 or above
  9. Upgrade cloud provider and CSI driver
  10. Verify DHCP load balancer service and create a new Harvester PVC
  11. Shutting off VM and then restarting VM
@albinsun albinsun added this to the 1.2.2 milestone May 6, 2024
@albinsun albinsun self-assigned this May 6, 2024
@albinsun
Copy link
Contributor Author

albinsun commented May 7, 2024

Prerequisites

  1. 🟢 VLAN 1 network on mgmt and 1 network on bonded NICs

    image
    image

  2. 🟢 2 Virtual machines with data and md5sum computed- 1 running, 1 stopped

    image

  3. 🟢 2 VM backup, snapshots - 1 backup when VM is running and 1 backup when VM is stooped

    backup-target
    image
    Backup
    image
    Snapshot
    image

  4. 🟢 Create a new storage class apart from default one. Use the new storage class for some basic operations.

    Both 2 VMs has an additional volume using new customized storage class
    image
    image

  5. 🟢 Import to Rancher 2.7.10 and create an RKE2 (1.26) guest cluster. provisioned on Harvester VM before the upgrade.

    Import to rancher-v2.7.10
    image
    Create RKE2 (v1.26) cluster
    image

  6. ⚪ Deploy Harvester cloud provider to RKE1RKE2 Cluster (prior to latest version) ⚠️ RKE2 install harvester-cloud-provider by default

    image

  7. ⚪ Install Harvester CSI Driver (prior to latest version) ⚠️ RKE2 install harvester-csi-driver by default

    image

  8. 🟢 Create a new Harvester PVC for nginx deployment

    Check default storage class
    image
    Create PVC
    image
    Nginx deployment
    image
    image

  9. 🟢 Verify DHCP load balancer service

    image

@albinsun
Copy link
Contributor Author

albinsun commented May 7, 2024

Upgrade Rancher to v2.7.11

Ref. https://ranchermanager.docs.rancher.com/v2.7/getting-started/installation-and-upgrade/install-upgrade-on-a-kubernetes-cluster/upgrades#upgrade-outline

  1. 🟢 Upgrade successfully

    image

  2. 🟢 Harvester is still Active

    image

  3. 🟢 VM is still there and data not loss

    image

  4. 🟡 RKE2 cluster is still running
  5. 🟢 Nginx deployment and LB still works

    Deployment
    image
    LB
    image

@albinsun
Copy link
Contributor Author

albinsun commented May 7, 2024

Upgrade Harvester to v1.2.2-rc

v1.2.2-rc2

Trial#1

Ref. https://docs.harvesterhci.io/v1.2/upgrade/index/#prepare-an-air-gapped-upgrade

  1. 🔴 Upgrade successfully

    [BUG] Live migration fail when upgrade v1.2.1 to v1.2.2-rc2 due to virError harvester#5755

Trial#2

  1. 🔴 Upgrade successfully

    [BUG] upgrade stuck in waiting plan restart-rancher-system-agent to complete harvester#5690

v1.2.2-rc3

Trial#1

  1. 🟢 Upgrade successfully

    image

  2. 🟢 Harvester is still Active

    image

Post upgrade checks

  1. Dependencies Version Check
    Chart: https://github.com/harvester/harvester/tree/master/deploy/charts/harvester/dependency_charts
    Values: https://github.com/harvester/harvester/blob/v1.2/deploy/charts/harvester/values.yaml

    • 🟢 csi-snapshotter

      image

    • 🟢 kubevirt-operator

      image

    • 🟢 kubevirt

      image

    • 🟢 snapshot-validation-webhook

      image

    • 🟢 whereabouts

      image

  2. Versions of all the CRDs

    • dependency_charts

      • 🟢 snapshotter

        volumesnapshotclasses.snapshot.storage.k8s.io
        image
        volumesnapshotcontents.snapshot.storage.k8s.io
        image
        volumesnapshots.snapshot.storage.k8s.io
        image

      • 🟢 kubevirt-operator

        kubevirts.kubevirt.io
        image

      • 🟢 whereabouts

        ippools.whereabouts.cni.cncf.io
        image
        overlappingrangeipreservations.whereabouts.cni.cncf.io
        image

    • harvester-crd
      https://github.com/harvester/harvester/tree/v1.2/deploy/charts/harvester-crd/templates

      • 🟢 addons.harvesterhci.io

        image

      • 🟢 keypairs.harvesterhci.io

        image

      • 🟢 preferences.harvesterhci.io

        image

      • 🟢 settings.harvesterhci.io

        image

      • 🟢 supportbundles.harvesterhci.io

        image

      • 🟢 upgradelogs.harvesterhci.io

        image

      • 🟢 upgrades.harvesterhci.io

        image

      • 🟢 versions.harvesterhci.io

        image

      • 🟢 virtualmachinebackups.harvesterhci.io

        image

      • 🟢 virtualmachineimages.harvesterhci.io

        image

      • 🟢 virtualmachinerestores.harvesterhci.io

        image

      • 🟢 virtualmachinetemplates.harvesterhci.io

        image

      • 🟢 virtualmachinetemplateversions.harvesterhci.io

        image

  3. 🟢 Virtual machines are in same state as before and accessible.

    VM is still there and data not loss
    image

    RKE2 cluster is still running
    image

    Nginx deployment and LB still works
    image
    image

  4. 🟡 Restore the backups, check the data

    Running VM

    1. Restore New
      image
      image
    2. Restore Replace
      image
      image
    3. Restore New again
      image
      image

    Stopped VM

    1. Restore New
      image
      image
    2. Restore Replace
      image
      image
    3. Restore New again
      image
      image

    Successive restore hit known issue
    [BUG] VMs from successive "Restore New" backup/snapshot tend to get the same IP.  harvester#4474

  5. 🟢 Image and volume status

    image
    image

  6. 🟢 Monitoring chart status

    image
    image

  7. VM operations are highlighted and working fine.

  8. 🟢 Add a node after the upgrade

    node-3 is added as Compute node
    image
    Can access harvester via rancher
    image
    LB and deployment still work
    image

  9. 🟢 Upgrade the RKE2 guest cluster to 1.27 or above Before ![image](https://github.com/harvester/tests/assets/2773781/817bcb76-b4ac-4319-928b-58c3d8d28c5c) ![image](https://github.com/harvester/tests/assets/2773781/9220108c-ed06-4cd4-888d-fbb5d5c3f2f4)

    Upgrade
    image
    image

    After
    image
    deployments looks good
    image
    Can access nginx deployment
    image
    LB still works
    image

  10. 🟡 Upgrade cloud provider and CSI driver

    cloud provider

    • Before
      image
    • Upgrade
      image
    • After
      image

    csi-driver

    • Before
      image
    • After
      image

    Known cloud-provider issue w/ workaround
    [BUG] A pending fail pod from different hub is generated after upgrade rancher harvester#5382

  11. 🟢 Verify DHCP load balancer service and create a new Harvester PVC

    DHCP LB
    image
    create a new Harvester PVC
    image
    image
    image

  12. 🟢 Shutting off VM and then restarting VM
    1. Before
      image
    2. Shutting off
      image
    3. Restart and check
      image

@albinsun
Copy link
Contributor Author

Upgrade Rancher to v2.7.12

Ref. https://ranchermanager.docs.rancher.com/v2.7/getting-started/installation-and-upgrade/install-upgrade-on-a-kubernetes-cluster/upgrades#upgrade-outline

  1. 🟢 Upgrade successfully

    image

  2. 🟢 Harvester is still Active

    image

  3. 🟢 VM is still there and data not loss

    image

  4. 🟢 RKE2 cluster is still running

    image
    image

  5. 🟢 Nginx deployment and LB still works

    image
    image

@albinsun
Copy link
Contributor Author

Upgrade Harvester to v1.3-head (v1.3-bf5847ad-head)

  1. 🟢 Upgrade successfully

    image

  2. 🟢 Harvester is still Active

    image

  3. 🟢 Virtual machines are in same state as before and accessible.

    VM is still there and data not loss
    image

    RKE2 cluster is still running
    image
    image

    Nginx deployment and LB still works
    image
    image

  4. 🟡 Restore the backups, check the data

    Running VM

    1. Restore New
      image
    2. Restore Replace
      image
      image

    Stopped VM

    1. Restore New
      image

    2. Restore Replace
      image
      image

    Successive restore hit known issue
    [BUG] VMs from successive "Restore New" backup/snapshot tend to get the same IP.  harvester#4474

  5. 🟢 Image and volume status

    image
    image

  6. 🔴 Monitoring chart status

    No monitoring charts
    image
    Can NOT disable addon
    image

    image

    helm-install-rancher-monitoring-vhh29 + echo 'Installing helm_v3 chart'                                                                              
    helm-install-rancher-monitoring-vhh29 + helm_v3 install --version 103.0.3+up45.31.1 rancher-monitoring rancher-monitoring/rancher-monitoring --value
    helm-install-rancher-monitoring-vhh29 Error: INSTALLATION FAILED: cannot re-use a name that is still in use 
    

    No charts, and unable to disable addon.

  7. VM operations are highlighted and working fine.

  8. ⚪ Upgrade the RKE2 guest cluster to 1.27 or above Before

    Upgrade

    After

    deployments looks good

    Can access nginx deployment

    LB still works

    Skip due to RKE2 version do not change

  9. 🟢 Verify DHCP load balancer service and create a new Harvester PVC

    DHCP LB
    image

    create a new Harvester PVC
    image
    image

  10. 🟢 Shutting off VM and then restarting VM

    image
    image
    image

@albinsun
Copy link
Contributor Author

Upgrade Rancher to v2.8.3

Ref. https://ranchermanager.docs.rancher.com/v2.7/getting-started/installation-and-upgrade/install-upgrade-on-a-kubernetes-cluster/upgrades#upgrade-outline

  1. 🟢 Upgrade successfully

    image

  2. 🟢 Harvester is still Active

    image

  3. 🟢 VM is still there and data not loss

    image

  4. 🟡 RKE2 cluster is still running

    image
    image
    After apply workardound
    image

    Known cloud-provider issue w/ workaround
    [BUG] A pending fail pod from different hub is generated after upgrade rancher harvester#5382

  5. 🟢 Nginx deployment and LB still works

    image
    image

@albinsun
Copy link
Contributor Author

Close as symptoms are found in v1.2.2-rc3 to v1.3-head phase and it's basically fine for v1.2.1 to v1.2.2-rc3 phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant