Backup restore and template installation should write directly to LVM volumes #3230

qubesuser · 2017-10-27T14:51:28Z

Qubes OS version:

R4.0-rc2

Steps to reproduce the behavior:

Try to restore a backup or try to install a template package

Expected behavior:

dom0 disk space usage does not change significantly.
Backups with VMs larger than half the size of the disk can be restored.

Actual behavior:

dom0 disk space usage changes significantly because the data is first written to a file in the dom0 root and then copied over.

Backups with VMs larger than half the size of the disk cannot be restored since there is not enough disk space for both the data on dom0 root and on the LVM volume

General notes:

This is a big issue for restoring large VMs and also fixing this would allow to use a smaller dom0 root rather than sizing it to be as large as the thin pool, saving GBs wasted for filesystem structures for an unnecessarily big filesystem (also need to make sure log files don't expand out of control for that).

marmarek · 2017-10-29T00:43:10Z

Template installation is much less issue, as it is limited by template builder to 10GB. And also it is much trickier to solve, as RPM do not like to write directly to block device (or rather - a socket/pipe - remember that now it use Admin API, so you can install template also from a management VM).

As for backup restore - only parts are stored as files (100MB each), and in parallel are uploaded to the actual VM volume (using Admin API). But currently there is no limit on how many such parts are queued. This is because (currently) you can't control the speed of archive extraction (either tar, or qfile-unpacker). That would require either adding additional layer (cat-like process, used to pause data input when needed), or instructing somehow extractor process to pause operation (SIGSTOP/SIGCONT? that could be fragile...).
Not storing those fragments as files at all would be very tricky, because you need to verify the fragment before doing anything with it. And you can do that only when the full fragment is extracted. You can not (should not) start parsing its content in any way before verification.
Some alternative could be using tmpfs, or using memory directly (python object). But that could easily lead to OOM, especially when restoring using a VM (aka "paranoid mode").

qubesuser · 2017-10-29T01:18:39Z

I think one could write the fragments directly to an LVM volume (for instance using tar --to-stdout and piping to dd), verify either by reading from the VM volume or by teeing the data to the verification, and then rename the VM volume to $vm-private if verification passes.

For templates, probably the best solution is to not ship them in the RPMs, but rather ship them like the installation ISOs and only provide a download link and hash in the RPM, and have the RPM install script download the ISO and pipe it via qrexec to dom0 while the checksum is verified in parallel and the install is finalized only if the checksum verification succeeds.

marmarek · 2017-10-29T08:57:45Z

This is too late if you want to keep clean tasks separation. The principle is to do nothing with the data until it gets verified. While the current implementation may indeed allow that (not use the volume until it get renamed to $vm-private), we should not make such assumption. Also keep in mind, the backup restore tool should not assume direct access to LVM. It use Admin API to upload volume content. So, such mechanism would require introducing some additional action to rename volume, or separate "upload" and "commit" actions. Reading the volume for verification is intentionally not supported through Admin API, but that isn't a problem here, because you can calculate data hash on the fly (and in fact scrypt tool we use there do that already).
There is also one technical detail - you need somehow pass individual fragments to scrypt for decryption and verification. While its output could be redirected somewhere, for input you need to separate individual VM's volumes (and their fragments), so just tar --to-stdout isn't feasible, because you'll get all of them concatenated.

Backup archive is split into fragments exactly to allow limiting temporary space needed to do a backup and to restore it. The latter feature is not implemented, but the current architecture should allow that.

na-- · 2017-10-29T11:01:44Z

@qubesuser: I think that if the other issue you reported is fixed, this one would not be that big of a deal.

@marmarek: If this is up-to-date, that means that there's a tar extraction of the huge backup file in the beginning. The tar options --checkpoint= and --checkpoint-action=exec=...... can be used to limit the speed of the archive extraction with some artificial sleep. It's an ugly hack but I use it for a task that needs piping tar extraction of huge files in /tmp and processing them as they are being extracted.

Here's the code I use: tar --checkpoint=20000 --checkpoint-action=exec='sleep "$(stat -f --format="(((%b-%a)/%b)^5)*30" /tmp | bc -l)"' --extract --verbose __other_tar_args__ | program_to_process_extracted_files

Ugly as sin, but it causes tar to sleep progressively more as /tmp is being filled up, so that the program_to_process_extracted_files can catch up with processing and deleting the already extracted files. For more complex flow control logic, tar can call an external script that implements it, for example "pause extraction of file n until file n-2 is processed and removed" or something of the sort, which should be much less fragile than signalling tar externally

Edit: link to the tar checkpoint documentation: https://www.gnu.org/software/tar/manual/html_section/tar_26.html and https://www.gnu.org/software/tar/manual/html_section/tar_29.html

marmarek · 2017-10-29T17:11:36Z

The tar options --checkpoint= and --checkpoint-action=exec=...... can be used to limit the speed of the archive extraction with some artificial sleep. It's an ugly hack but I use it for a task that needs piping tar extraction of huge files in /tmp and processing them as they are being extracted.

Tar is used there only if backup file is exposed directly to dom0. If it is loaded from some VM (like sys-usb), then qfile-unpacker is used. But in this case we could add such option ourselves.

qubesuser · 2017-10-30T15:52:56Z

Yeah, it would need some sort of upload+commit interface (with hashes computed on the fly): ideally one where a qrexec connection is kept open until a commit command is sent, and the VM/volume is deleted automatically when the connection is broken or upon booting the system (to handle the system being hard rebooted during restore).

Not totally sure how to setup the input with tar. Maybe it could be possible to create private.img.XXX files as UNIX sockets or fifos and convince tar to write into them instead of recreating them? (perhaps tar --overwrite does that, not sure). Or use tar --to-stdout with a single-file filelist, if tar can seek efficiently (but this requires that the input be a file and not a pipe from another VM, unless tar is run in the other VM). Alternatively, one could even just use tar --to-stdout with all the files and have them concatenated, and then split them afterwards since the size of each fragment is known (or can be determined by separately running tar -t).

marmarek · 2017-10-30T17:20:56Z

Generally it is too late for major changes in backup (or other) architecture changes for Qubes 4.0. Upload+commit may be a good idea for Qubes 4.1. Splitting concatenated files, or placing fifos for tar to write to is IMO too fragile to consider it at all. Backup mechanism is complex enough already.

One think we may consider at this stage, is slowing down tar/qfile-unpacker enough to not require too much space in in /tmp. --checkpoint-action is interesting, but exact command there needs to be adjusted. I'd put there something controlled from python script, and from there make sure not more than X files/size units are waiting to be handled. For example reading 1 byte from a pipe; and from python write 1 byte after each file is handled. And also put X bytes there at the beginning. Classic token solution.
What "checkpoint" ("record") unit is? I though it may be one tar block (512 bytes), but according to simple test with --checkpoint=1 it is closer to "a file" (but sometimes two small files are fit between checkpoints). Do you know any documentation about this? @na--

na-- · 2017-10-31T07:23:21Z

@marmarek: sorry, I'm not sure. I've read only what's in the tar manual and it's not very specific. I remember fiddling with the options until it was good enough and leaving it at that, since in my case it was not for something very important. I thought that a record is one tar block, but apparently not.

jpouellet · 2017-11-09T18:57:20Z

@qubesuser can you elaborate on what exactly you see an upload+commit interface performing and looking like?

Just the ability to write a stream directly to pool storage with some temporary name guaranteed to never be used by any VM, returning perhaps some token to be used by admin.vm.volume.CloneTo or such?

marmarek · 2019-08-20T12:22:42Z

@jpouellet take a look about --checkpoint option above.

heinrich-ulbricht · 2019-09-07T21:03:58Z

Currently I'm in a position where I need to find a solution for restoring huge backups while having nearly no space in dom0 for restore left.
I started a thread over in the Google Groups and the community was tremendously helpful so far.
Unfortunately I now seem to be in a position where I need to get a fix/hack applied to restore.py that prevents the restore operation from generating hundreds of GB of temporary data. And the "sleep fix" is a hot candidate.
I made a (naive?) suggestion for a restore.py modification here. Maybe somebody could have a look if this could work?

github-actions · 2023-08-05T09:22:41Z

This issue is being closed because:

This issue is on the "Release 4.0 updates" milestone.
Qubes OS 4.0 reached EOL (end-of-life) over one year ago.
There has not been any activity on this issue in over one year.

If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment.
(For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)

DemiMarie · 2024-04-13T20:00:43Z

@marmarek Has this actually been fixed?

marmarek · 2024-04-13T20:22:53Z

I knew we had an issues for this! Yes, #8876

DemiMarie · 2024-04-13T20:32:19Z

Did QubesOS/qubes-core-admin-client#278 fix backup restore too?

marmarek · 2024-04-13T23:13:28Z

No, that's independent, and it doesn't suffer the same issue as templates. The restore issue was fixed differently: QubesOS/qubes-core-admin-client@9360865

DemiMarie · 2024-04-14T01:07:42Z

Closing as “completed”.

andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core labels Oct 28, 2017

andrewdavidwong added this to the Release 4.0 milestone Oct 28, 2017

na-- mentioned this issue Oct 29, 2017

dom0 root filesystem not mounted with discard on thin provisioning #3226

Closed

andrewdavidwong modified the milestones: Release 4.0, Release 4.0 updates Mar 31, 2018

marmarek mentioned this issue May 29, 2019

qvm-backup-restore fails with scrypt: Input is not valid scrypt-encrypted block #4794

Closed

marmarek mentioned this issue Aug 20, 2019

Check tmp free space before start extraction QubesOS/qubes-core-admin-client#101

Closed

andrewdavidwong added the P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. label Aug 21, 2019

heinrich-ulbricht mentioned this issue Sep 9, 2019

Canceling backup restore displays confirmation message, but restore is never actually cancelled #5304

Closed

DemiMarie mentioned this issue Mar 22, 2021

Switch default pool from LVM to BTRFS-Reflink #6476

Open

andrewdavidwong added the eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) label Aug 5, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023

DemiMarie closed this as completed Apr 14, 2024

DemiMarie removed the eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) label Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup restore and template installation should write directly to LVM volumes #3230

Backup restore and template installation should write directly to LVM volumes #3230

qubesuser commented Oct 27, 2017

marmarek commented Oct 29, 2017

qubesuser commented Oct 29, 2017 •

edited

Loading

marmarek commented Oct 29, 2017

na-- commented Oct 29, 2017 •

edited

Loading

marmarek commented Oct 29, 2017

qubesuser commented Oct 30, 2017 •

edited

Loading

marmarek commented Oct 30, 2017

na-- commented Oct 31, 2017

jpouellet commented Nov 9, 2017

marmarek commented Aug 20, 2019

heinrich-ulbricht commented Sep 7, 2019 •

edited

Loading

github-actions bot commented Aug 5, 2023

DemiMarie commented Apr 13, 2024

marmarek commented Apr 13, 2024

DemiMarie commented Apr 13, 2024

marmarek commented Apr 13, 2024

DemiMarie commented Apr 14, 2024

Backup restore and template installation should write directly to LVM volumes #3230

Backup restore and template installation should write directly to LVM volumes #3230

Comments

qubesuser commented Oct 27, 2017

Qubes OS version:

Steps to reproduce the behavior:

Expected behavior:

Actual behavior:

General notes:

marmarek commented Oct 29, 2017

qubesuser commented Oct 29, 2017 • edited Loading

marmarek commented Oct 29, 2017

na-- commented Oct 29, 2017 • edited Loading

marmarek commented Oct 29, 2017

qubesuser commented Oct 30, 2017 • edited Loading

marmarek commented Oct 30, 2017

na-- commented Oct 31, 2017

jpouellet commented Nov 9, 2017

marmarek commented Aug 20, 2019

heinrich-ulbricht commented Sep 7, 2019 • edited Loading

github-actions bot commented Aug 5, 2023

DemiMarie commented Apr 13, 2024

marmarek commented Apr 13, 2024

DemiMarie commented Apr 13, 2024

marmarek commented Apr 13, 2024

DemiMarie commented Apr 14, 2024

qubesuser commented Oct 29, 2017 •

edited

Loading

na-- commented Oct 29, 2017 •

edited

Loading

qubesuser commented Oct 30, 2017 •

edited

Loading

heinrich-ulbricht commented Sep 7, 2019 •

edited

Loading