Description
I'd like to use a helper script to create atomic point-in-time snapshot backups of virtual machines - the VM block devices are backed by ceph (rbd) block devices. The existing backup-vm script can't easily do this, since it backs up bind-mounted real files or block devices using --read-special. It might be possible to adapt backup-vm to work in this situation, but it's already forced to work in a bit of a hacky way (e.g. bind mounts feel a bit heavy handed to back up block devices), and I wondered if another way to invoke borg create
might be better for this?
This seems to be more-or-less a general problem shared by various application-aware helpers... e.g. Database backups tools, where the data may be generated on-the-fly or otherwise not readily presentable in either of the two ways that borg currently allows i.e. either a single 'file', fed to stdin (with --stdin
and --stdin-name
), or a directory containing special files and/or plain files (with --read-special
).
A backup may need to include one or more block devices (and other data such as nvram storage), and also VM metadata, and optionally the contents of RAM and other related state too.
It seems better to be able to store a single archive containing multiple "virtual" files, perhaps also including some metadata (e.g. name and version of the external tool used, timestamp of the atomic snapshot and perhaps something like directions for the user on how to restore the data as additional files).
A couple of possible solutions:
- The helper backup passes file metadata to borg (including a file descriptor which can be read to obtain the content) for each "virtual file" in the archive.
- A single data stream passed to borg via stdin e.g. a created-on-the-fly archive (tar or similar), which borg reads and gets the data it needs from. Might be tricky for borg to skip data that it's not interested in (e.g. because the particular file's modification time hasn't changed).
For the first option, this could perhaps be json passed to borg (e.g. on the command line or via stdin) containing the metadata, and also the number of an open fd which borg can use to read the data (e.g. the helper invokes borg which inherits the necessary file descriptors - analogous to the use of the gpg
utility e.g. gpg --passphrase-fd 8
). Alternative mechanisms would also be possible.
In some cases, it might be possible to use FUSE to expose the data to be backed up as a "virtual filesystem", but that also feels a bit hacky, and it may not be possible for all data sources, and it also wouldn't be possible to pass advanced metadata to borg via FUSE (e.g. byte ranges which have changed since the last backup operation if the source is able to track these). It would make writing the helper applications quite difficult too.
Any thoughts?