-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file-reflink, a storage driver optimized for CoW filesystems #188
Conversation
80d2ada
to
b3a24c8
Compare
Does CI policy require no new pylint warnings? There's one protected-access warning that I had planned on leaving in, because the accessed attribute isn't mine. (I've also fixed one warning and silenced one false positive error. They weren't printed by my local pylint version.) |
On second thought, let me handle that more cleanly... |
Generally yes. But in some specific cases, muting with |
b3a24c8
to
17a47db
Compare
Codecov Report
@@ Coverage Diff @@
## master #188 +/- ##
==========================================
- Coverage 56.32% 54.65% -1.67%
==========================================
Files 55 56 +1
Lines 8734 9000 +266
==========================================
Hits 4919 4919
- Misses 3815 4081 +266
Continue to review full report at Codecov.
|
I've fixed the underlying cause of the warning (and added a previously missing |
a02f511
to
2730d82
Compare
Huh, that's not actually true for the If so, I propose to make these
Old e.g. backed up VMs still using |
This is intentionally left in place, to allow VM to verify original root image. This would allow semi-untrusted storage domain. See original architecture paper. This part is yet to be implemented... Since this is unused right now, I think its ok to disable it, but not completely remove. And leave clear comment why it is this way. So:
Yes.
No. And not really needed, as currently if /dev/xvda is read-write, then /dev/mapper/dmroot is just a symlink (not dm-linear as it was in R3.2). BTW Are you waiting for me on this PR? I've skimmed through it and looks fine, but haven't done careful review yet. Are you going to port tests in this PR, or separate at some future time? |
2730d82
to
093a78b
Compare
Ah, okay.
That's fantastic!
If possible, I'd implement tests in a separate PR, before submitting a third one to automatically use file-reflink (instead of 'file') for varlibqubes when a btrfs-without-LVM layout has been set up during R4.1 installation. As for this first PR, getting it merged would conveniently stop breaking my qubesd during qubes-core-admin package upgrades. 8) Hey while I have your ear, should |
Not sure if you know, but you can install additional storage pool drivers from outside of qubes-core-admin. Just provide appropriate entry point (as you've done here in setup.py). That wouldn't work for default_pool, but you can set it using
That's ok. |
qubes/storage/reflink.py
Outdated
# inefficient CoW-on-CoW setup. Avoid this by always | ||
# overriding root to be read-write - which may become | ||
# incompatible with a future untrusted storage domain! | ||
volume_config['rw'] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better patch it directly where volume_config is defined (and similar for other VM types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please separate commit for such changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6da472e
to
958d6c7
Compare
Thanks, I'll give that a try. |
958d6c7
to
ed38621
Compare
This adds the file-reflink storage driver. It is never selected automatically for pool creation, especially not the creation of 'varlibqubes' (though it can be used if set up manually). The code is quite small: reflink.py lvm.py file.py + block-snapshot sloccount 334 lines 447 (134%) 570 (171%) Background: btrfs and XFS (but not yet ZFS) support instant copies of individual files through the 'FICLONE' ioctl behind 'cp --reflink'. Which file-reflink uses to snapshot VM image files without an extra device-mapper layer. All the snapshots are essentially freestanding; there's no functional origin vs. snapshot distinction. In contrast to 'file'-on-btrfs, file-reflink inherently avoids CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs' private volumes are CoW. (And turning off the lower, filesystem-level CoW for 'file'-on-btrfs images would turn off data checksums too, i.e. protection against bit rot.) Also in contrast to 'file', all storage features are supported, including - any number of revisions_to_keep - volume.revert() - volume.is_outdated - online fstrim/discard Example tree of a file-reflink pool - *-dirty.img are connected to Xen: - /var/lib/testpool/appvms/foo/volatile-dirty.img - /var/lib/testpool/appvms/foo/root-dirty.img - /var/lib/testpool/appvms/foo/root.img - /var/lib/testpool/appvms/foo/private-dirty.img - /var/lib/testpool/appvms/foo/private.img - /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z - /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z - /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z - /var/lib/testpool/appvms/bar/... - /var/lib/testpool/appvms/... - /var/lib/testpool/template-vms/fedora-26/... - /var/lib/testpool/template-vms/... It looks similar to a 'file' pool tree, and in fact file-reflink is drop-in compatible: $ qvm-shutdown --all --wait $ systemctl stop qubesd $ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml $ systemctl start qubesd $ sudo rm -f /path/to/pool/*/*/*-cow.img* If the user tries to create a fresh file-reflink pool on a filesystem that doesn't support reflinks, qvm-pool will abort and mention the 'setup_check=no' option. Which can be passed to force a fallback on regular sparse copies, with of course lots of time/space overhead. The same fallback code is also used when initially cloning a VM from a foreign pool, or from another file-reflink pool on a different mountpoint. 'journalctl -fu qubesd' will show all file-reflink copy/rename/remove operations on VM creation/startup/shutdown/etc.
ed38621
to
1695a73
Compare
I don't know if any template currently hits this code path, even the fedora-26-minimal root.img is large enough to be split into multiple parts. Maybe Arch Linux? Related to QubesOS/qubes-core-admin#188
This adds the file-reflink storage driver. It is never selected automatically for pool creation, especially not the creation of
varlibqubes
(though it can be used if set up manually).The code is quite small:
Background: btrfs and XFS (but not yet ZFS) support instant copies of individual files through the
FICLONE
ioctl behindcp --reflink
. Which file-reflink uses to snapshot VM image files without an extra device-mapper layer. All the snapshots are essentially freestanding; there's no functional origin vs. snapshot distinction.In contrast to 'file'-on-btrfs, file-reflink inherently avoids CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs' private volumes are CoW. (And turning off the lower, filesystem-level CoW for 'file'-on-btrfs images would turn off data checksums too, i.e. protection against bit rot.)
Also in contrast to 'file', all storage features are supported, including
revisions_to_keep
volume.revert()
volume.is_outdated
fstrim
/discard
Example tree of a file-reflink pool -
*-dirty.img
are connected to Xen:/var/lib/testpool/appvms/foo/volatile-dirty.img
/var/lib/testpool/appvms/foo/root-dirty.img
/var/lib/testpool/appvms/foo/root.img
/var/lib/testpool/appvms/foo/private-dirty.img
/var/lib/testpool/appvms/foo/private.img
/var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
/var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
/var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
/var/lib/testpool/appvms/bar/...
/var/lib/testpool/appvms/...
/var/lib/testpool/template-vms/fedora-26/...
/var/lib/testpool/template-vms/...
It looks similar to a 'file' pool tree, and in fact file-reflink is drop-in compatible:
If the user tries to create a fresh file-reflink pool on a filesystem that doesn't support reflinks, qvm-pool will abort and mention the
setup_check=no
option. Which can be passed to force a fallback on regular sparse copies, with of course lots of time/space overhead. The same fallback code is also used when initially cloning a VM from a foreign pool, or from another file-reflink pool on a different mountpoint.journalctl -fu qubesd
will show all file-reflink copy/rename/remove operations on VM creation/startup/shutdown/etc.TODO: port unit tests