Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dom0 root filesystem not mounted with discard on thin provisioning #3226

Closed
qubesuser opened this issue Oct 27, 2017 · 11 comments
Closed

dom0 root filesystem not mounted with discard on thin provisioning #3226

qubesuser opened this issue Oct 27, 2017 · 11 comments
Labels
C: installer r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Milestone

Comments

@qubesuser
Copy link

Qubes OS version:

R4.0-rc2

Steps to reproduce the behavior:

  1. Install Qubes using default settings, selecting LVM Thin Provisioning
  2. Create large file in dom0 from /dev/zero and delete it

Expected behavior:

The space consumed by dom0 root does not reflect the deleted file, and / filesystem has discard option turned on

Actual behavior:

The space consumed by dom0 root reflects the deleted file, / filesystem does not have discard option turned on

General notes:

If discard is not working on dom0 root, disk space may be inexplicably exhausted (due to backup restores, etc. temporarily using space in dom0 root) since dom0 root filesystem is as large as the thin pool by default.

Workaround

Fix fstab and run fstrim /

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: installer labels Oct 28, 2017
@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Oct 28, 2017
@na--
Copy link

na-- commented Oct 29, 2017

Maybe this deserves a separate issue, but right now I'm not sure that the dom0 volume should be in the LVM thin pool at all - it's too fragile. I've somehow managed to turn the whole pool (including the dom0 root volume in it) read-only several times in the last few days. I think most of the times that was due to inadvertently filling up the free space, since LVM restricts writes when that happens (see "Data space exhaustion" in man lvmthin). The issue is that it's very difficult (if not impossible) to fix this condition if the dom0 root is read-only and usually results in a hard reset of the system.

I know this will make restoring large backups harder, but I would really prefer the default drive partitioning to be:

physical drive
`-LUKS
  `-LVM PV/VG
    `-dom0 root & home (10 GB?)
    `-swap (1-2GB)
    `-LVM thin pool (remaining space)
      `-vm-sys-usb
      `-vm-sys-net
      ...
`-boot (1GB)

I think that this should be a much more stable configuration - if something happens to the thin pool, dom0 is unaffected and can be used for repairs. Please tell me if I this is stupid or if I should make it a separate issue.

@marmarek
Copy link
Member

Generally I agree with @na-- . 10GB for root would be too small (you need to fit whole root.img of the template during installation), but 20GB should be ok. The problem is anaconda code for handling partition layout is quite complex. I've already tried something simpler: have root fs not using the whole pool (but still have pool filling all the disk), but failed after initial tries.

If anyone know anaconda and/or blivet and want to help, that would be awesome.

@qubesuser
Copy link
Author

The issue with that is that there seems to be no way of shrinking thin pools, which means that if dom0 root is outside the thin pool it cannot be grown beyond the space assigned to it, which can be problematic if one wants to install lots of software in dom0 (e.g. they want to try out GNOME or KDE in dom0).

I think dm-thin may still allow writes that don't cause metadata changes (i.e. those that don't break CoW or increase size), so it may be possible to just zero out a smaller dom0 partition before formatting so it's preallocated and should continue working.

A possible solution for the Anaconda issue is to simply create a new thin LV on first Qubes boot, copy the whole root filesystem to it, and then replace the original dom0 root with it. This has the advantage that the size of the LV can be computed automatically depending on the used space on the original dom0 root (this is going to be more useful once the GUI domain is split, since then there will be much less reason to install lots of software in dom0).

@tasket
Copy link

tasket commented Mar 19, 2018

Although I've had my pool almost fill-up (due largely to dom0 not discarding on /), I haven't yet experienced any problems like / going read-only since I started using 4.0rc in October. In fact, I'd say that using discard would have avoided the those problems in the first place.

I think the best way forward is to very simply enable discard on dom0 root. Otherwise, a half-solution like external fixed partition opens up a huge can of worms for system management tasks like restoring large VMs, handling templates and disk images. A 20GB root will create more boot-into-read-only fs incidents (and many more complaints) than adding discard to the current config... not less.

Not to mention that having the unused space unavailable to domUs when those admin tasks aren't being performed -- added to the ridiculously large swap space that anaconda already allocates -- will irritate people and waste their resources.

@marmarek
Copy link
Member

Starting with qubes-core-dom0-linux 4.0.13, there is fstrim.timer enabled in dom0, which performs fstrim -a once a week. Enabling discard on / may have negative performance consequences, especially when one enable also discard on LUKS layer. Especially on cheap, or older SSD...

I agree that choosing the right size for statically allocated dom0 root is tricky task, but the current situation is also problematic - VMs can easily (depending on used disk space) DoS dom0. And while filling just VMs storage isn't that big problem (you might need to remove some VM, or just file inside some, and reboot the system afterwards in the worst case), filling space for dom0 filesystem is much more problematic, because you VM management tools will stop working. Note that filling up free space inside filesystem on static LV will just result in "no space left on device" errors for applications trying to write something - freeing up space will immediately fix the problem. Filling up space in thin pool results in I/O errors, possibly forcing read-only remount and in the worst case filesystem corruption (unlikely with current filesystems, but still).

Maybe something in the middle could be used - static LV for root filesystem + thin allocated LV mounted somewhere in dom0 (/var/tmp?). On my system right now, / uses 19GB, which include 3.7GB in /var/lib/qubes/vm-kernels.

@tasket
Copy link

tasket commented Mar 19, 2018

This really should be a PEBCAK issue; users generally know they shouldn't fill up their disk.

On Qubes 3.2 just use a DE widget to monitor disk space, same as regular Linux. So the dom0 filesystem manages free space effectively, and the user is empowered to be responsible about disk usage without having to dodge curve balls.

Now on 4.0 we have a deallocation problem and people are flying blind all at once.

A simple 2-color meter + discard in fstab would restore the feedback, balance and relative simplicity users have on R3.2 so the system both manages and communicates disk space effectively.

OTOH, adding "large inflexible admin volume" that is still too small for certain tasks demands substantially more understanding and effort from users. Then the issue becomes less PEBCAK and more of a design, maintenance, documentation and cultural problem. You get lots of howtos and discussion about the care and feeding of anti-feature-X under various use cases, another banal "tend to this!" techie meme that contributes to users wanting something else.

@tasket
Copy link

tasket commented Mar 19, 2018

FWIW, I didn't see your last response before submitting the prior post.

Adding /tmp to the thin pool may reintroduce similar space/performance problems to the admin tasks that are at issue.

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

With the older 2012 vintage SSD in my primary system, disk performance has been fine (with discard) for all my VMs including dom0. But this is about logical deallocation of extents.... right? We're not talking about (small) blocks, and not about hardware TRIM. I don't see the issue here.

DoS: Default domU size is only 2GB, is user-controlled and the user should be seeing per-VM disk allocation anyway, as they would with Qubes Manager on 3.2. Even so, this does raise a question about leaving a minimal amount of free space. For one, the normal DE warnings about low space should be enabled in dom0.

@jharveyb
Copy link

jharveyb commented Mar 19, 2018

Independent of the choice to enable discards by default, a DE widget to monitor disk usage for 4.0 is useful.

This script works with an Xfce Generic Monitor to present a single-color bar showing space used, and a tooltip showing free space. Since it just uses qvm-pool it won't include unallocated space that vgs & pgs show, but that seems acceptable.

#!/bin/sh
SIZE=$(qvm-pool -i lvm | awk '/^size/ {print $2}')
USAGE=$(qvm-pool -i lvm | awk '/^usage/ {print $2}')
FREE=$(($SIZE - $USAGE))
USEDCENT=$((100*$USAGE/$SIZE + 200*$USAGE/$SIZE % 2))
FREEGB=$(echo $FREE | cut -c 1-3)
FREEMB=$(echo $FREE | cut -c 4-5)
echo "<tool>$FREEGB.$FREEMB GB FREE</tool>"
echo "<bar>$USEDCENT</bar>"

@marmarek
Copy link
Member

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

Ok, I feel convinced - since domUs are outside of dom0 filesystem, this change shouldn't affect them. And indeed, we already use discard in domUs and no one complained so far (although 4.0 is still in rc phase). Anyway, if it would be problematic on some disks: a) one may not enable TRIM/DISCARD on LUKS layer, b) one may disable it in dom0 fstab (and/or templates).

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Mar 20, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
@qubesos-bot
Copy link

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

fepitre pushed a commit to fepitre/anaconda that referenced this issue Oct 18, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Oct 18, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre added a commit to fepitre/anaconda that referenced this issue Oct 18, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre added a commit to fepitre/anaconda that referenced this issue Oct 18, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Oct 20, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Nov 20, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226

Signed-off-by: Frédéric Pierret <frederic.epitre@orange.fr>
fepitre pushed a commit to fepitre/anaconda that referenced this issue Nov 20, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226

Signed-off-by: Frédéric Pierret <frederic.epitre@orange.fr>
fepitre pushed a commit to fepitre/anaconda that referenced this issue Nov 20, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 19, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 24, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 25, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 26, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 26, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 26, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 27, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 28, 2018
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Jan 3, 2019
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Jan 3, 2019
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Feb 3, 2019
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Mar 16, 2019
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Mar 16, 2019
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
marmarek added a commit to marmarek/qubes-anaconda that referenced this issue Dec 25, 2019
fepitre pushed a commit to fepitre/anaconda that referenced this issue Mar 26, 2020
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Apr 6, 2020
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Apr 19, 2020
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Oct 16, 2021
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Dec 27, 2022
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Jan 4, 2023
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Jan 4, 2023
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
fepitre pushed a commit to fepitre/anaconda that referenced this issue Jan 10, 2023
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: installer r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

7 participants