Skip to content

PowerFlex/ScaleIO - MDM and host SDC connection enhancements #11047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 16, 2025

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Jun 17, 2025

Description

This PR enhances the PowerFlex/ScaleIO MDM and host SDC connections, includes the following changes (and some code improvements).

  • Introduced timeout configuration 'powerflex.mdm.change.apply.wait' (Default value: 1000 ms) at zone scope, to wait after MDM addition, and before & after MDM removal changes made on Host with ScaleIO SDC. Also, Changes to apply the wait time after making MDM changes for ScaleIO in prepare and unprepare logic.

  • Introduced configuration flag 'powerflex.block.sdc.unprepare' (Default is false) at zone scope, to enable/disable blocking unprepare ScaleIO SDC connection when SDC client restart required (upon PowerFlex MDM removal i.e. no support for --remove_mdm in drv_cfg cmd) and there are volumes attached to the Host. Added validation to fail Host disconnect from Storage Pool if there are Volumes attached and SDC client MDM removal requires scini service to be restarted.

  • Introduced configuration flag 'powerflex.mdm.validate.on.connect' (Default is false) at zone scope, to enable/disable validation of MDM addresses on Host, in the Configuration File and in CLI cmd (drv_cfg --query_mdms) output matches or not, during storage pool registration in agent.

  • Added detection of MDM removal support via CLI. If MDM removal support via CLI supported then use CLI, Otherwise fall back to edit drv_cfg.txt and restart scini as earlier. Tested with /opt/emc/scaleio/sdc/bin/drv_cfg --version: DellEMC PowerFlex Version: R3_6.4000.124, with cmd: /opt/emc/scaleio/sdc/bin/drv_cfg --remove_mdm.

  • Added agent property 'powerflex.sdc.service.wait' for the time (in secs) to wait after SDC service start/restart/stop, and retries to fetch SDC id/guid.

  • Updated to allow unprepare SDC when there are no volumes mapped on the host for other connected pools (with same SDC Id, i.e pools of same PowerFlex storage cluster).

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tested the new settings, and PowerFlex SDC connections (MDM add/remove) with VM & Volume operations.

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 26.66667% with 242 lines in your changes missing coverage. Please review.

Project coverage is 16.58%. Comparing base (be22bfe) to head (7433231).
Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
...cloudstack/storage/datastore/util/ScaleIOUtil.java 28.85% 99 Missing and 7 partials ⚠️
.../hypervisor/kvm/storage/ScaleIOStorageAdaptor.java 26.31% 58 Missing and 12 partials ⚠️
...orage/datastore/manager/ScaleIOSDCManagerImpl.java 0.00% 32 Missing ⚠️
...re/lifecycle/ScaleIOPrimaryDataStoreLifeCycle.java 20.00% 14 Missing and 2 partials ⚠️
...s/src/main/java/com/cloud/utils/script/Script.java 0.00% 10 Missing ⚠️
...torage/datastore/provider/ScaleIOHostListener.java 0.00% 6 Missing ⚠️
...rapper/LibvirtModifyStoragePoolCommandWrapper.java 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main   #11047    +/-   ##
==========================================
  Coverage     16.57%   16.58%            
- Complexity    13971    13988    +17     
==========================================
  Files          5743     5746     +3     
  Lines        510648   511103   +455     
  Branches      62105    62170    +65     
==========================================
+ Hits          84641    84754   +113     
- Misses       416534   416863   +329     
- Partials       9473     9486    +13     
Flag Coverage Δ
uitests 3.91% <ø> (+<0.01%) ⬆️
unittests 17.47% <26.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland added this to the 4.21.0 milestone Jun 17, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the handling of ScaleIO MDM and host SDC connections by introducing new configuration keys for timeouts, validation, and blocking behavior along with updating the underlying command execution and logging mechanisms.

  • Introduced new configuration keys, including MdmsChangeApplyTimeout, ValidateMdmsOnConnect, and BlockSdcUnprepareIfRestartNeededAndVolumesAreAttached.
  • Updated MDM add/remove logic to use varargs and improved command execution via Script.executeCommand.
  • Adjusted test cases and adapter logic to account for the new ScaleIO configurations.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
utils/src/main/java/com/cloud/utils/script/Script.java Added a new executeCommand(String) method for command execution including stdout/stderr handling.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/util/ScaleIOUtil.java Refactored MDM add/remove methods and updated command templates, patterns, and file read operations.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/manager/ScaleIOSDCManagerImpl.java Updated configuration details sent to hosts by including new timeout settings.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/manager/ScaleIOSDCManager.java Introduced new config key definitions and updated the getConfigKeys() return values.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/lifecycle/ScaleIOPrimaryDataStoreLifeCycle.java Injected the new configuration details during maintain and cancellation procedures.
plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/client/ScaleIOGatewayClient.java Added a short documentation comment for STORAGE_POOL_MDMS.
plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/storage/ScaleIOStorageAdaptorTest.java Updated test mocks to reflect changes in command executions for MDM removal.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/ScaleIOStorageAdaptor.java Introduced validation of MDM state and timeout application after MDM changes.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtModifyStoragePoolCommandWrapper.java Wrapped storage pool creation in a try/catch block to better handle CloudRuntimeException.
Comments suppressed due to low confidence (1)

plugins/storage/volume/scaleio/src/main/java/org/apache/cloudstack/storage/datastore/util/ScaleIOUtil.java:315

  • [nitpick] The ordering of stdout and stderr here is reversed compared to other usages of Script.executeCommand; consider using a consistent ordering (stdout as first, stderr as second) to avoid confusion.
String stdErr = result.first(); String stdOut = result.second();

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 13811

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13812

Copy link
Contributor

@harikrishna-patnala harikrishna-patnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13562)

@harikrishna-patnala
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13586)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 69826 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11047-t13586-kvm-ol8.zip
Smoke tests completed. 136 look OK, 5 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestAccounts>:setup Error 0.00 test_accounts.py
ContextSuite context=TestAddVmToSubDomain>:setup Error 0.00 test_accounts.py
test_DeleteDomain Error 15.92 test_accounts.py
test_forceDeleteDomain Failure 15.66 test_accounts.py
ContextSuite context=TestRemoveUserFromAccount>:setup Error 16.38 test_accounts.py
ContextSuite context=TestTemplateHierarchy>:setup Error 1537.55 test_accounts.py
ContextSuite context=TestDeployVmWithAffinityGroup>:setup Error 0.00 test_affinity_groups_projects.py
ContextSuite context=TestAnnotations>:setup Error 0.00 test_annotations.py
ContextSuite context=TestAsyncJob>:setup Error 0.00 test_async_job.py
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13907

@rohityadavcloud
Copy link
Member

rohityadavcloud commented Jun 26, 2025

@sureshanaparti as these are general bugfix/improvements for powerflex storage, should these be also applicable for 4.20 branch (esp these changes are scaleio/powerflex storage plugin related with no DB changes)?

@sureshanaparti sureshanaparti changed the title PowerFlex/ScaleIO MDM and host SDC connection enhancements PowerFlex/ScaleIO - MDM and host SDC connection enhancements Jun 27, 2025
@sureshanaparti
Copy link
Contributor Author

@sureshanaparti as these are general bugfix/improvements for powerflex storage, should these be also applicable for 4.20 branch (esp these changes are scaleio/powerflex storage plugin related with no DB changes)?

@rohityadavcloud There are no DB changes, but changes in host SDC connection behavior earlier in this PR: #9903 (already part of main), where the SDC connection is controlled using MDMs addition/removal instead of SDC service scini start/stop. This PR changes are on top of these, and introduces some configurations to validate the MDMs and apply timeout after add/remove MDMs. I think, it's better to keep this new behavior with main itself.
A minor improvement (of keeping wait time after SDC service start/restart/stop, and retries to fetch SDC id/guid) is applicable for the old behavior as well, so can go in 4.20 branch. I've raised a separate PR for it here: #11099

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14072

@vladimirpetrov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@vladimirpetrov a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

[SF] Trillian test result (tid-13713)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 422 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11047-t13713-kvm-ol8.zip
Smoke tests completed. 0 look OK, 0 have errors, 81 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
all_test_deploy_vm Skipped --- test_deploy_vm.py
all_test_escalations_templates Skipped --- test_escalations_templates.py
all_test_vm_ha Skipped --- test_vm_ha.py
all_test_vm_sync Skipped --- test_vm_sync.py
all_test_accounts Skipped --- test_accounts.py
all_test_affinity_groups_projects Skipped --- test_affinity_groups_projects.py
all_test_affinity_groups Skipped --- test_affinity_groups.py
all_test_async_job Skipped --- test_async_job.py
all_test_certauthority_root Skipped --- test_certauthority_root.py
all_test_create_list_domain_account_project Skipped --- test_create_list_domain_account_project.py
all_test_deploy_vgpu_enabled_vm Skipped --- test_deploy_vgpu_enabled_vm.py
all_test_deploy_virtio_scsi_vm Skipped --- test_deploy_virtio_scsi_vm.py
all_test_deploy_vm_iso Skipped --- test_deploy_vm_iso.py
all_test_deploy_vm_root_resize Skipped --- test_deploy_vm_root_resize.py
all_test_deploy_vms_with_varied_deploymentplanners Skipped --- test_deploy_vms_with_varied_deploymentplanners.py
all_test_deploy_vm_with_userdata Skipped --- test_deploy_vm_with_userdata.py
all_test_diagnostics Skipped --- test_diagnostics.py
all_test_direct_download Skipped --- test_direct_download.py
all_test_disk_offerings Skipped --- test_disk_offerings.py
all_test_domain_disk_offerings Skipped --- test_domain_disk_offerings.py
all_test_domain_network_offerings Skipped --- test_domain_network_offerings.py
all_test_domain_service_offerings Skipped --- test_domain_service_offerings.py
all_test_domain_vpc_offerings Skipped --- test_domain_vpc_offerings.py
all_test_dynamicroles Skipped --- test_dynamicroles.py
all_test_global_settings Skipped --- test_global_settings.py
all_test_guest_vlan_range Skipped --- test_guest_vlan_range.py
all_test_host_annotations Skipped --- test_host_annotations.py
all_test_hostha_simulator Skipped --- test_hostha_simulator.py
all_test_internal_lb Skipped --- test_internal_lb.py
all_test_iso Skipped --- test_iso.py
all_test_list_ids_parameter Skipped --- test_list_ids_parameter.py
all_test_loadbalance Skipped --- test_loadbalance.py
all_test_login Skipped --- test_login.py
all_test_metrics_api Skipped --- test_metrics_api.py
all_test_migration Skipped --- test_migration.py
all_test_multipleips_per_nic Skipped --- test_multipleips_per_nic.py
all_test_nested_virtualization Skipped --- test_nested_virtualization.py
all_test_network_acl Skipped --- test_network_acl.py
all_test_network Skipped --- test_network.py
all_test_nic_adapter_type Skipped --- test_nic_adapter_type.py
all_test_nic Skipped --- test_nic.py
all_test_non_contigiousvlan Skipped --- test_non_contigiousvlan.py
all_test_outofbandmanagement_nestedplugin Skipped --- test_outofbandmanagement_nestedplugin.py
all_test_outofbandmanagement Skipped --- test_outofbandmanagement.py
all_test_over_provisioning Skipped --- test_over_provisioning.py
all_test_password_server Skipped --- test_password_server.py
all_test_portable_publicip Skipped --- test_portable_publicip.py
all_test_portforwardingrules Skipped --- test_portforwardingrules.py
all_test_primary_storage Skipped --- test_primary_storage.py
all_test_privategw_acl Skipped --- test_privategw_acl.py
all_test_projects Skipped --- test_projects.py
all_test_public_ip_range Skipped --- test_public_ip_range.py
all_test_pvlan Skipped --- test_pvlan.py
all_test_regions Skipped --- test_regions.py
all_test_reset_vm_on_reboot Skipped --- test_reset_vm_on_reboot.py
all_test_resource_accounting Skipped --- test_resource_accounting.py
all_test_resource_detail Skipped --- test_resource_detail.py
all_test_router_dhcphosts Skipped --- test_router_dhcphosts.py
all_test_router_dns Skipped --- test_router_dns.py
all_test_router_dnsservice Skipped --- test_router_dnsservice.py
all_test_routers_iptables_default_policy Skipped --- test_routers_iptables_default_policy.py
all_test_routers_network_ops Skipped --- test_routers_network_ops.py
all_test_routers Skipped --- test_routers.py
all_test_scale_vm Skipped --- test_scale_vm.py
all_test_secondary_storage Skipped --- test_secondary_storage.py
all_test_service_offerings Skipped --- test_service_offerings.py
all_test_snapshots Skipped --- test_snapshots.py
all_test_ssvm Skipped --- test_ssvm.py
all_test_staticroles Skipped --- test_staticroles.py
all_test_templates Skipped --- test_templates.py
all_test_usage_events Skipped --- test_usage_events.py
all_test_usage Skipped --- test_usage.py
all_test_vm_deployment_planner Skipped --- test_vm_deployment_planner.py
all_test_vm_life_cycle Skipped --- test_vm_life_cycle.py
all_test_vm_snapshots Skipped --- test_vm_snapshots.py
all_test_volumes Skipped --- test_volumes.py
all_test_vpc_redundant Skipped --- test_vpc_redundant.py
all_test_vpc_router_nics Skipped --- test_vpc_router_nics.py
all_test_vpc_vpn Skipped --- test_vpc_vpn.py
all_test_host_maintenance Skipped --- test_host_maintenance.py
all_test_hostha_kvm Skipped --- test_hostha_kvm.py

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 14085

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14093

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13730)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 54534 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11047-t13730-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@sureshanaparti sureshanaparti marked this pull request as ready for review July 14, 2025 06:53
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14160

@vladimirpetrov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@vladimirpetrov a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on manual testing, tested various scenarios with old/new ScaleIO SDC clients, 2 ScaleIO pools from the same/different cluster, with powerflex.block.sdc.unprepare=true/false. Tested global settings:
powerflex.connect.on.demand
powerflex.block.sdc.unprepare
powerflex.mdm.change.apply.wait
powerflex.sdc.service.wait
powerflex.mdm.validate.on.connect

@rohityadavcloud
Copy link
Member

Let's wait for final smoketest run before merging this.

@blueorangutan
Copy link

[SF] Trillian test result (tid-13779)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 57482 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11047-t13779-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland DaanHoogland merged commit 3220eb4 into apache:main Jul 16, 2025
26 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.21.0 Jul 16, 2025
@DaanHoogland DaanHoogland deleted the scaleio_mdm_enhancements branch July 16, 2025 06:25
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Aug 1, 2025
…11047)

* Cumulative enhancements fix for ScaleIO: MDM add/remove, Host prepare/unprepare, validate Storage Pool can be created in Agent.

- Implemented validation to fail Host disconnect from Storage Pool if there are Volumes attached and SDC client MDM removal requires scini service to be restarted
- Implemented Storage Pool validation by checking whether MDM addresses from configuration file and from memory (using CLI) matches, otherwise file ModifyStoragePool command.
- Introduced configuration key to apply timeout after making MDM changes for ScaleIO: powerflex.mdm.change.apply.timeout.ms (default 1000ms)
- Implemented logic to apply timeout after making MDM changes for ScaleIO in prepare and unprepare logic
- Added detection of MDM removal support via CLI
- If MDM removal support via CLI supported then use CLI, fall back to edit drv_cfg.txt and restart scini instead

Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
Co-authored-by: mprokopchuk <mprokopchuk@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants