Add support for creating and managing LVM cache volumes #235

vojtechtrefny · 2021-11-03T12:08:12Z

No description provided.

vojtechtrefny · 2021-11-03T12:09:51Z

library/blivet.py

+
+    def _attach_cache(self):
+        """ Create a new cache pool and attach it to the volume """
+        raise BlivetAnsibleError("adding cache to an existing volume is currently not supported")


Blivet unfortunately doesn't support creating a new cache pool. We can attach an existing cache pool to the LV, but we can't create it. I plan to add support for creating cache pools to blivet and I'll update this when the support is available.

vojtechtrefny · 2021-11-03T12:39:16Z

Question: How should I handle situation where user specifies incompatible volume type -- this uses LVM cache so it doesn't make sense to use cached: true for partitions. I checked the deduplication/compression support which has a similar issue and when I specify deduplication: true and type: "partition", it's simply ignored so I ignored it too, but I think we should probably return error in these situations. So should these "conflicts" be checked somewhere? It looks like there isn't a special "input sanity check" in library/blivet.py for these situations.

richm · 2021-11-03T17:00:20Z

Question: How should I handle situation where user specifies incompatible volume type -- this uses LVM cache so it doesn't make sense to use cached: true for partitions. I checked the deduplication/compression support which has a similar issue and when I specify deduplication: true and type: "partition", it's simply ignored so I ignored it too, but I think we should probably return error in these situations. So should these "conflicts" be checked somewhere? It looks like there isn't a special "input sanity check" in library/blivet.py for these situations.

There generally isn't any input validation in system roles. But in specific cases, if it is important to prevent data loss, security breaches, and other critical issues, you can add custom input validation and parameter checking.

@japokorn is adding some input validation #223 but this is mostly a syntax check, not the deeper checking you are proposing.

The more checking you can do at the ansible level, the better, although for your case, it might not be possible, and you'll have to do it in the python module. There isn't really a code template or best practices for this - I suppose you could look at some of the other roles to see what they do, or if you know of any external ansible modules that do something like what you want to do.

tests/test-verify-volume-cache.yml

richm · 2021-11-04T12:59:38Z

[citest commit:82a5f96a1405a759fe9b9f420dbfa903f5b5813f]

richm · 2021-11-04T15:18:40Z

[citest commit:150b1cddb23919095238379f5c1510aa1d17e556]

richm · 2021-11-05T21:30:01Z

lgtm - can someone from the storage team like @japokorn or @dwlehman give it a quick look?
don't worry about the CI failures - having some issues there

richm · 2021-11-08T19:23:59Z

@vojtechtrefny can you rebase this PR on top of the latest master branch? I'd like to make sure this PR is ready for ansible-core 2.12

dwlehman

Looks good other than the one question about cache device specification.

dwlehman · 2021-11-09T18:59:47Z

README.md

+#### `cache_mode`
+Mode for the cache. Supported values include `writethrough` (default) and `writeback`.
+
+#### `cache_devices`


PVs could be partitions or md as well as whole disks. Their names may be difficult to predict. Is it the case that, as of now, we cannot operate on non-existent PVs/VGs because blivet cannot yet create cache pools? If it is possible, is there a way to specify the cache devices in terms of which disks they are on in these cases?

Good point. I think we can allow specifying list of disks and look for PVs in their children. I'll update the PR.

I've added new commit that adds support for using disks for cache_devices instead of PVs.

lgtm-com · 2021-11-10T15:03:36Z

This pull request introduces 1 alert when merging d7dbad5 into 6f6c598 - view on LGTM.com

new alerts:

1 for Suspicious unused loop iteration variable

tests/tests_create_lvm_cache_then_remove.yml

tests/tests_fatals_cache_volume.yml

richm · 2021-12-01T18:36:04Z

[citest commit:8385d92945de0c43748a57aab77968efe082b09a]

richm

lgtm - @dwlehman or @japokorn ?

vojtechtrefny · 2021-12-13T11:32:37Z

Rebased to resolve conflict with #241

richm · 2021-12-13T18:23:31Z

@vojtechtrefny any idea about the rhel-x (RHEL9) failure? Is there a problem with mdadm or raid on RHEL9?

vojtechtrefny · 2021-12-14T08:54:14Z

@vojtechtrefny any idea about the rhel-x (RHEL9) failure? Is there a problem with mdadm or raid on RHEL9?

I was able to reproduce this locally (not with 100 % success, probably one of out ten runs fails) and it looks like another instance of "we can't trust data from udev".

The array is created correctly with 2 devices and 1 spare

2021-12-14 09:01:57,647 INFO program/MainThread: Running [21] mdadm --detail /dev/md/test1 ...
2021-12-14 09:01:57,660 INFO program/MainThread: stdout[21]: /dev/md/test1:
           Version : 1.0
     Creation Time : Tue Dec 14 09:01:57 2021
        Raid Level : raid1
        Array Size : 2096064 (2046.94 MiB 2146.37 MB)
     Used Dev Size : 2096064 (2046.94 MiB 2146.37 MB)
      Raid Devices : 2
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Dec 14 09:01:57 2021
             State : clean, resyncing 
    Active Devices : 2
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 1

Consistency Policy : bitmap

     Resync Status : 6% complete

              Name : localhost.localdomain:test1  (local to host localhost.localdomain)
              UUID : 2d4201e0:7ac94b94:17a61671:87b66aa5
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     252       17        0      active sync   /dev/vdb1
       1     252       33        1      active sync   /dev/vdc1

       2     252       49        -      spare   /dev/vdd1

but udev doesn't show the filesystem type for the spare. This is what lsblk shows about the array after we create it

vdb            2147483648 root  disk  brw-rw----                                                                                                                           
`-vdb1         2146435072 root  disk  brw-rw---- linux_raid_member localhost.localdomain:test1 2d4201e0-7ac9-4b94-17a6-167187b66aa5   72d22362-01                          
  `-md127      2146369536 root  disk  brw-rw---- xfs                                           87e08a14-3430-4ccd-9589-28d1a4c84e11                                        /opt/test1
vdc            2147483648 root  disk  brw-rw----                                                                                                                           
`-vdc1         2146435072 root  disk  brw-rw---- linux_raid_member localhost.localdomain:test1 2d4201e0-7ac9-4b94-17a6-167187b66aa5   2baff06e-01                          
  `-md127      2146369536 root  disk  brw-rw---- xfs                                           87e08a14-3430-4ccd-9589-28d1a4c84e11                                        /opt/test1
vdd           21474836480 root  disk  brw-rw----                                                                                                                           
`-vdd1        21473787904 root  disk  brw-rw----                                                                                      56f79f9a-01                          
  `-md127      2146369536 root  disk  brw-rw---- xfs                                           87e08a14-3430-4ccd-9589-28d1a4c84e11                                        /opt/test1

and because we (blivet) don't see the linux_raid_member on it we don't examine it using mdadm -E /dev/vdd1 and don't know it is a spare.
The problem might be also related to the fact the array is still resyncing at this point

Dec 14 09:42:50 localhost.localdomain kernel: md: resync of RAID array md127
Dec 14 09:42:50 localhost.localdomain kernel: md127: detected capacity change from 0 to 4192128
Dec 14 09:42:50 localhost.localdomain kernel: md/raid1:md127: active with 2 out of 2 mirrors
Dec 14 09:42:50 localhost.localdomain kernel: md/raid1:md127: not clean -- starting background reconstruction

I'll try to thing about some workaround in blivet for these situations. We as well might to issue udevadm trigger for all devices we are going to scan in reset(). It's already expensive operation so it shouldn't make it worse. Thoughts @dwlehman @japokorn (we probably should move this discussion to a blivet issue).

richm · 2021-12-14T14:47:10Z

I was able to reproduce this locally (not with 100 % success, probably one of out ten runs fails) and it looks like another instance of "we can't trust data from udev".

Does it work if you set storage_udevadm_trigger: true? Or is it that you need to do the udevadm trigger inside of the blivet library?

vojtechtrefny · 2021-12-15T12:34:09Z

I was able to reproduce this locally (not with 100 % success, probably one of out ten runs fails) and it looks like another instance of "we can't trust data from udev".

Does it work if you set storage_udevadm_trigger: true? Or is it that you need to do the udevadm trigger inside of the blivet library?

Oh, I forgot you added this. It looks like it fixes the issue, out 50 runs none failed with storage_udevadm_trigger: true. I created #245, lets see what CI thinks about it. We definitely need to do something about this in Blivet too, but this looks like a simple workaround for now.

vojtechtrefny · 2021-12-15T15:36:59Z

Rebased and added the LVM cache parameters to module_args to make this work with #223

It might be hard for users to know names of the partitions we'll create as PVs so we should let them disk names and get the PVs on them ourselves when creating the cache.

richm · 2021-12-17T03:24:18Z

[citest bad]

richm · 2021-12-17T03:24:25Z

[citest pending]

japokorn

Looks good to me.

vojtechtrefny commented Nov 3, 2021

View reviewed changes

vojtechtrefny force-pushed the master_lvm-cache branch from b3a1d10 to 63500af Compare November 3, 2021 12:19