Skip to content

Commit 2c8a873

Browse files
bors[bot]sara hartse
andauthored
Merge #162
162: DLPX-67251 Device removal fails due to inconsistent device names r=shartse a=shartse **Problem** On ESX, we use by-link links to create pools and to import them on migrated systems. There are two links per device, each with the same base number but with a different prefix (wwn vs scsi). When creating a pool for the first time, we explicitly use the wwn link, if it's available. However, it is not possible to specify this when importing a pool and since device links are created by udev asynchronously, sometimes a migrated pool can end up with a combination of wwn and scsi linked devices. This causes issues when we try to manage the devices with zpool commands passed through the delphix application. The DE thinks the correct name of the pool is the wwn version (since it's now present), but that doesn’t actually exist in the pool and the operation fails. **Possible Solutions** 1. Expand zpool import to be able to specify precise files (and prefixes) to always pick the wwn links 2. Wait until all links are created before importing (most likely, by calling `udevadm settle`) 3. Modify the udev rules so that only one by-id link per device is created. 4. Change DE so that it can identify devices within pools with greater flexibility. I decided against 1 and 4 since they'd require somewhat larger scope changes in ZFS and the app-stack respectively. 2 would add extra latency to migration, especially since `settle` waits for all different udev events, not just the ones pertinent to the devices we care about. I found that option 3 can be achieved by modifying the existing udev rules, so that's what I've gone with here. I've also opened an app-gate review here: http://reviews.delphix.com/r/54112 to codify the use of the scsi prefix ids by default. In the future, I'd like to move towards a solution where we write our own udev rules that covers devices across all platforms we support to get a single "delphix-id" and hopefully reduce the complexity here and in the app-stack. **Testing** I manually tested that a migration to a VM with this change was successful and saw that the pool was created using all scsi links. ``` domain0 22.5G 44.7M 22.5G - - 0% 0% 1.00x ONLINE - scsi-36000c299e8f8f8e643410a9a5aaf3595 7.50G 23.8M 7.48G - - 0% 0.30% - ONLINE scsi-36000c29ca33c1d0b3f1ee232d02652fd 7.50G 16.7M 7.48G - - 0% 0.21% - ONLINE scsi-36000c29359be70663180556fbd93d05a 7.50G 4.22M 7.50G - - 0% 0.05% - ONLINE ``` I also checked that configuring, adding and removing the devices all worked as expected on a migrated as well as a clean installed VM. I also tested the change on a GCP VM (the other platform we use by-id links on) and found no change in behavior. Automated tests: `git-ab-pre-push --test-upgrade-from 5.3.6.0 -p esx` http://selfservice.jenkins.delphix.com/job/devops-gate/job/master/job/appliance-build-orchestrator-pre-push/2500/ Co-authored-by: sara hartse <sara.hartse@delphix.com>
2 parents 5d23548 + d1e7a7c commit 2c8a873

File tree

1 file changed

+25
-0
lines changed
  • files/common/var/lib/delphix-platform/ansible/10-delphix-platform/roles/delphix-platform/tasks

1 file changed

+25
-0
lines changed

files/common/var/lib/delphix-platform/ansible/10-delphix-platform/roles/delphix-platform/tasks/main.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,31 @@
439439
regexp: '(.*)\|xvd\*(.*)'
440440
line: '\1\2'
441441

442+
#
443+
# The default udev rules create two different by-id links for each storage
444+
# device on ESX, based the same serial number but with different prefixes.
445+
# The first is based on the bus type (scsi) and the second is a catch-all
446+
# "World Wide Name" (wwn). After migration, we import domain0 with the
447+
# "/dev/disk/by-id" path, but since udev runs asynchronously, we may end up
448+
# with a mix of wwn and scsi aliases. This causes problems when the DE tries to
449+
# match devices on the system to those in the pool, i.e. for removal.
450+
#
451+
# This moves the wwn links to the /dev/disk/by-id/wwn sub-directory, keeping it
452+
# available as a backup but limiting the /dev/disk/by-id namespace to one type
453+
# of id. We override the original rules in /lib/ with our new version in /etc/.
454+
#
455+
- copy:
456+
remote_src: yes
457+
src: /lib/udev/rules.d/60-persistent-storage.rules
458+
dest: /etc/udev/rules.d/60-persistent-storage.rules
459+
owner: root
460+
group: root
461+
mode: 0644
462+
- replace:
463+
path: /etc/udev/rules.d/60-persistent-storage.rules
464+
regexp: 'disk\/by-id\/wwn-'
465+
replace: 'disk/by-id/wwn/'
466+
442467
#
443468
# Enable CRA for external variants
444469
#

0 commit comments

Comments
 (0)