We use a lot of Virtual Machines (VM) to run ours infrastructure, these are on multiple host machines running Logical Volume Management (LVM) with each VM on its on its own Logical Volume (LV). After faffing around with multiple backup strategies and programs I settled on using BorgBackup (short: Borg) due to its special handling of block devices along with its deduplication and compression capabilities.
Borgs’ documentation is vast and can be found here: https://borgbackup.readthedocs.io/en/stable/
BlockFuse is quite an ancient program and was written by Eric Wheeler initially to do rdiff-backups, but works perfectly in this case as well to mount blockdevice. More information can be found here: https://www.linuxglobal.com/blockfuse-to-the-rescue-rdiff-backup-of-lvm-snapshots-and-block-devices/
Backup of online disks
The system doesn’t need to be taken offline if a snapshot of the VM can be created guaranteeing that the disk image won’t change for the duration of the backup. This works on the premise that a disk snapshot gives you the same state disk as you’d get if you just pulled the power from the computer, in either case the operating system should be able to boot and correct any filesystem errors thanks to the likes of journalled filesystems.
VM hypervisors provide snapshots of running VMs that implement a copy on write mechanism whereby disk changes are catalogued to a separate image, but backing these up can be problematic because the original image and snapshot both need storing and the hypervisor’s needed for the restore.
Logical Volume Management (LVM) offers an alternative where the VM is told to use a block device provided by LVM, snapshots at the LVM level provide an interface of a second block device which remains unchanged for the duration of the snapshot backup. LVM has the additional benefits of providing disk image resizing without requiring contiguous disk space to do so and interfacing the VM directly at the block layer bypasses the filesystem layer improving performance.
Backup efficiency of the disk image vs. just the data
There are two aspects to the efficiency of backing up whole disk images vs just the data contained in the filesystem.
- First, you’re backing up all the unused part of the disk that’s of no importance
- Second, does the whole disk get need backing up the moment a single change is made?
One solution to both of these is to use deduplication and compression, deduplication will read in the disk image in blocks and work out a hash for each block so that only the unique blocks need storing and repeats just increment reference counts; compression will then work on the unique blocks that need storing. This technique will reduce the full disk image backup that has to be done first time and subsequent backups will only be affected by the changed blocks.
The effectiveness of deduplication and compression will vary depending on the data being backed up but as an example the file repository “repos” reports:
- Filesystem usage: 146GB
- Compressed backup of filesystem content: 85GB
- Deduped compressed image: 77GB
Both using gzip compression.
VM hypervisors can also provide another option for backing up only the disk changes by publishing the changed blocks of a VM. This option’s available at a considerable price with VMware and is experimental in other hypervisors so hasn’t been pursued. Additionally, once an initial full backup’s been done using deduplication and compression the increments take care of themselves in the same way.
From the above we’re starting to get a picture of the backup system’s requirements and the kind of backup software needed:
- Automated backups into which we can trigger LVM snapshots.
- Simple restores, easy to kick off for complete disaster recovery without detailed configuration
- Direct backup of block devices, much software would require the inefficient step of copying the device content to file
- Deduplication and compression
- Complete off site backup for disaster recovery
- Local backup desirable for quick testing and quick recovery of single VM host failures
- Not an obvious requirement but restoring 100s of gigabytes of data does require good management to avoid exhausting available disk space
In the next post we shall look at how exactly we achieved this…
Follow this link to read part two >> Backing up of KVMs with BorgBackup and BlockFuse Pt. 2