Skip to content

Maintenance mode: Add host to deployment planner avoid list to fix local storage migration #9892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

BartJM
Copy link
Contributor

@BartJM BartJM commented Nov 5, 2024

Description

This PR adds the host preparing for maintenance to the avoid list for the deployment planner.

Without adding the host to the avoid list the deployment planner will return the host preparing for maintenance when a vm has a local storage root disk and has the host preparing for maintenance as the last host.

Steps to reproduce

  • Create vm with local storage
  • Set host with the vm in maintenance

Actual behaviour

(qa6-4.19-upstream-54bd5b60-kvm-host1) Failed to prepare host for maintenance due to: Migration of the vm VM instance {"id":13,"instanceName":"i-2-13-VM","type":"User","uuid":"209d346b-b2e7-485f-a889-8462dd0078cc"}from host Host {"id":1,"name":"qa6-4.19-upstream-54bd5b60-kvm-host1","type":"Routing","uuid":"224ca458-90e6-4efb-8532-e5efc83df9f4"} to destination host Host {"id":1,"name":"qa6-4.19-upstream-54bd5b60-kvm-host1","type":"Routing","uuid":"224ca458-90e6-4efb-8532-e5efc83df9f4"} doesn't involve migrating the volumes.

Expected

Local storage vm live migrated to another host.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Centos8 mbx env with host.maintenance.local.storage.strategy set to Migration.

Tested maintenance mode on a host with

  1. Local storage vm
  2. HA vm
  3. Local storage vm with ha data disk
  4. ha vm with ha datadisk
  5. Local storage vm with local data disk
  6. ha vm with local datadisk

5 and 6 result in an ErrorInMaintenance due to dest being null. Similar as described in #9887. Without this patch these would still fail with the same error as described in actual behavior.

After creation of a ha vm with ha data disk maintenance was not possible. After manual migration to another host and back, migration did complete. Migration without selecting host also gave a no destination found error and also occurred without this patch so is unrelated.

How did you try to break this feature and the system with this change?

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 15.80%. Comparing base (8af08dd) to head (c460198).
Report is 118 commits behind head on 4.20.

Files with missing lines Patch % Lines
...n/java/com/cloud/resource/ResourceManagerImpl.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.20    #9892      +/-   ##
============================================
- Coverage     15.81%   15.80%   -0.01%     
  Complexity    12580    12580              
============================================
  Files          5627     5627              
  Lines        492260   492262       +2     
  Branches      63955    60734    -3221     
============================================
- Hits          77832    77825       -7     
- Misses       405905   405914       +9     
  Partials       8523     8523              
Flag Coverage Δ
uitests 4.04% <ø> (ø)
unittests 16.63% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11525

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland
Copy link
Contributor

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11525

this seems to have to do with github api limits. trying a build in the background.

@rohityadavcloud rohityadavcloud added this to the 4.20.0 milestone Nov 22, 2024
Copy link
Contributor

@wido wido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good

@DaanHoogland
Copy link
Contributor

@blueorangutan test keepEnv

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

make sense

@blueorangutan
Copy link

[SF] Trillian test result (tid-11796)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 58809 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9892-t11796-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 403.96 test_vm_life_cycle.py

@DaanHoogland
Copy link
Contributor

DaanHoogland commented Dec 27, 2024

@BartJM , I finally got arounf testing this. There is no error but
image
I am not sure if migration should happen automatically for this vm. I had to migrate it "manually".

also:
image

@DaanHoogland
Copy link
Contributor

ping @BartJM , is this alright/as you expect as well?

@BartJM
Copy link
Contributor Author

BartJM commented Jan 20, 2025

@DaanHoogland There are some scenarios for vms that are expected to still fail:

  1. vms with local storage data disks
  2. vms with ha root disk and a ha data disk when the vm that has not been moved

The ErrorInMaintenance is expected when a vm failed to migrate.

@DaanHoogland DaanHoogland merged commit a163831 into apache:4.20 Jan 20, 2025
24 of 25 checks passed
Copy link

boring-cyborg bot commented Jan 20, 2025

Awesome work, congrats on your first merged pull request!

DaanHoogland added a commit that referenced this pull request Jan 20, 2025
* 4.20:
  Maintenance mode: Add host to deployment planner avoid list to fix local storage vm migration (#9892)
  Add project-user association normalization script to 4.20.1 upgrade (#10116)
  fix slider component for global settings of the range type (#10187)
  Clean up network permissions on account deletion (#10176)
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants