You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
162: DLPX-67251 Device removal fails due to inconsistent device names r=shartse a=shartse
**Problem**
On ESX, we use by-link links to create pools and to import them on migrated systems. There are two links per device, each with the same base number but with a different prefix (wwn vs scsi). When creating a pool for the first time, we explicitly use the wwn link, if it's available.
However, it is not possible to specify this when importing a pool and since device links are created by udev asynchronously, sometimes a migrated pool can end up with a combination of wwn and scsi linked devices.
This causes issues when we try to manage the devices with zpool commands passed through the delphix application. The DE thinks the correct name of the pool is the wwn version (since it's now present), but that doesn’t actually exist in the pool and the operation fails.
**Possible Solutions**
1. Expand zpool import to be able to specify precise files (and prefixes) to always pick the wwn links
2. Wait until all links are created before importing (most likely, by calling `udevadm settle`)
3. Modify the udev rules so that only one by-id link per device is created.
4. Change DE so that it can identify devices within pools with greater flexibility.
I decided against 1 and 4 since they'd require somewhat larger scope changes in ZFS and the app-stack respectively. 2 would add extra latency to migration, especially since `settle` waits for all different udev events, not just the ones pertinent to the devices we care about.
I found that option 3 can be achieved by modifying the existing udev rules, so that's what I've gone with here. I've also opened an app-gate review here: http://reviews.delphix.com/r/54112 to codify the use of the scsi prefix ids by default.
In the future, I'd like to move towards a solution where we write our own udev rules that covers devices across all platforms we support to get a single "delphix-id" and hopefully reduce the complexity here and in the app-stack.
**Testing**
I manually tested that a migration to a VM with this change was successful and saw that the pool was created using all scsi links.
```
domain0 22.5G 44.7M 22.5G - - 0% 0% 1.00x ONLINE -
scsi-36000c299e8f8f8e643410a9a5aaf3595 7.50G 23.8M 7.48G - - 0% 0.30% - ONLINE
scsi-36000c29ca33c1d0b3f1ee232d02652fd 7.50G 16.7M 7.48G - - 0% 0.21% - ONLINE
scsi-36000c29359be70663180556fbd93d05a 7.50G 4.22M 7.50G - - 0% 0.05% - ONLINE
```
I also checked that configuring, adding and removing the devices all worked as expected on a migrated as well as a clean installed VM. I also tested the change on a GCP VM (the other platform we use by-id links on) and found no change in behavior.
Automated tests: `git-ab-pre-push --test-upgrade-from 5.3.6.0 -p esx` http://selfservice.jenkins.delphix.com/job/devops-gate/job/master/job/appliance-build-orchestrator-pre-push/2500/
Co-authored-by: sara hartse <sara.hartse@delphix.com>
0 commit comments