Surprise: rdiff-backup (given our particular constraints).

As our data grows (and some filesystems balloon to over 800GBs, with many small files) we have started seeing our night time backups continue through the morning, causing serious disk i/o problems as our users wake up and regular usage rises.

For years we have implemented a conservative backup policy - each server runs the backup twice: once via rdiff-backup to the onsite server with 10 days of increments kept. A second is an rsync to our offsite backup servers for disaster recovery.

Simple, I thought. I will change the rdiff-backup to the onsite server to use the ultra fast and simple rsync. Then, I'll use borgbackup to create an incremental backup from the onsite backup server to our off site backup servers. Piece of cake. And with each server only running one backup instead of two, they should complete in record time.

Except, some how the rsync backup to the onsite backup server was taking almost as long as the original rdiff-backup to the onsite server and rsync backup to the offsite server combined. What? I thought nothing was faster than the awesome simplicity of rsync, especially compared to the ancient python-based rdiff-backup, which hasn't had an upstream release since 2009.

Turns out that rsync is not faster if disk i/o on the target server is your bottle neck.

By default, rsync determines if a file needs to be updated by comparing the time stamp and size of the files on both the source and the target server. That means rsync has to read the meta data on every single file on the source and every single file on the target. At first glance, this would seem faster than rdiff-backup, which compares sha1 checksums (it has to read the entire file, not just the metadata). And, this is definitely the case the first time rdiff-backup runs. However, rdiff-backup has a trick up its sleave: the rdiff-backup-data/mirror_metadata file.

As rdiff-backup runs, it keeps track of the sha1 checksum of every file it backups up in the mirror_medata file on the target. It seems that the next time it runs, it simply compares the sha1 on the source with the sha1 in this file, meaning it doesn't have to read each file on the target. The result: significantly less disk i/o on the target for faster backups (there is more disk i/o on the source, though, since rdiff-backup has to calculate the sha1 checksum instead of just collecting the size and last modified time stamp).

rdiff-backup also wins by saving all metadata (file ownership and permissions). Since we backup to a non-privileged user on the backup server, this data is lost with rsync. And, for reasons of simplicity, I appreciate having the backup files via a plain filesystem (unlike borgbackup which requires special commands just to get a listing of the files).

For the long term, filesystem-based backup tools seem like a losing proposition compared with block-based backups (like drbd). However, until we can re-organize our data to take advantage of drdb, we will be sticking with rdiff-backup.