pig-monkey.com

You are currently viewing all posts tagged with backups.

Cryptshot: Automated, Encrypted Backups with rsnapshot

Earlier this year I switched from Duplicity to rsnapshot for my local backups. Duplicity uses a full + incremental backup schema: the first time a backup is executed, all files are copied to the backup medium. Successive backups copy only the deltas of changed objects. Over time this results in a chain of deltas that need to be replayed when restoring from a backup. If a single delta is somehow corrupted, the whole chain is broke. To minimize the chances of this happening, the common practice is to complete a new full backup every so often — I usually do a full backup every 3 or 4 weeks. Completing a full backup takes time when you’re backing up hundreds of gigabytes, even over USB 3.0. It also takes up disk space. I keep around two full backups when using Duplicity, which means I’m using a little over twice as much space on the backup medium as what I’m backing up.

The backup schema that rsnapshot uses is different. The first time it runs, it completes a full backup. Each time after that, it completes what could be considered a “full” backup, but unchanged files are not copied over. Instead, rsnapshot simply hard links to the previously copied file. If you modify very large files regularly, this model may be inefficient, but for me — and I think for most users — it’s great. Backups are speedy, disk space usage on the backup medium isn’t too much more than the data being backed up, and I have multiple full backups that I can restore from.

The great strength of Duplicity — and the great weakness of rsnapshot — is encryption. Duplicity uses GnuPG to encrypt backups, which makes it one of the few solutions appropriate for remote backups. In contrast, rsnapshot does no encryption. That makes it completely inappropriate for remote backups, but the shortcoming can be worked around when backing up locally.

My local backups are done to an external, USB hard drive. Encrypting the drive is simple with LUKS and dm-crypt. For example, to encrypt /dev/sdb:

$ cryptsetup --cipher aes-xts-plain --key-size 512 --verify-passphrase luksFormat /dev/sdb

The device can then be opened, formatted, and mounted.

$ cryptsetup luksOpen /dev/sdb backup_drive
$ mkfs.ext4 -L backup /dev/mapper/backup_drive
$ mount /dev/mapper/backup_drive /mnt/backup/

At this point, the drive will be encrypted with a passphrase. To make it easier to mount programatically, I also add a key file full of some random data generated from /dev/urandom.

$ dd if=/dev/urandom of=/root/supersecretkey bs=1024 count=8
$ chmod 0400 /root/supersecretkey
$ cryptsetup luksAddKey /dev/sdb /root/supersecretkey

There are still a few considerations to address before backups to this encrypted drive can be completed automatically with no user interaction. Since the target is a USB drive and the source is a laptop, there’s a good chance that the drive won’t be plugged in when the scheduler kicks in the backup program. If it is plugged in, the drive needs to be decrypted before calling rsnapshot to do its thing. I wrote a wrapper script called cryptshot to address these issues.

Cryptshot is configured with the UUID of the target drive and the key file used to decrypt the drive. When it is executed, the first thing it does is look to see if the UUID exists. If it does, that means the drive is plugged in and accessible. The script then decrypts the drive with the specified key file and mounts it. Finally, rsnapshot is called to execute the backup as usual. Any argument passed to cryptshot is passed along to rsnapshot. What that means is that cryptshot becomes a drop-in replacement for encrypted, rsnapshot backups. Where I previously called rsnapshot daily, I now call cryptshot daily. Everything after that point just works, with no interaction needed from me.

If you’re interested in cryptshot, you can download it directly from GitHub. The script could easily be modified to execute a backup program other than rsnapshot. You can clone my entire backups repository if you’re also interested in the other scripts I’ve written to manage different aspects of backing up data.

Tarsnapper: Managing Tarsnap Backups

Tarsnap bills itself as “online backups for the truly paranoid”. I began using the service last January. It fast became my preferred way to backup to the cloud. It stores data on Amazon S3 and costs $0.30 per GB per month for storage and $0.30 per GB for bandwidth. Those prices are higher than just using Amazon S3 directly, but Tarsnap implements some impressive data de-duplication and compression that results in the service costing very little. For example, I currently have 67 different archives stored in Tarsnap from my laptop. They total 46GB in size. De-duplicated that comes out to 1.9GB. After compression, I only pay to store 1.4GB. Peanuts.

Of course, the primary requirement for any online backup service is encryption. Tarsnap delivers. And, most importantly, the Tarsnap client is open-source, so the claims of encryption can actually be verified by the user. The majority of for-profit, online backup services out there fail on this critical point.

So Tarsnap is amazing and you should use it. The client follows the Unix philosophy: “do one thing and do it well”. It’s basically like tar. It can create archives, read the contents of an archive, extract archives, and delete archives. For someone coming from an application like Duplicity, the disadvantage to the Tarsnap client is that it doesn’t include any way to automatically manage backups. You can’t tell Tarsnap how many copies of a backup you wish to keep, or how long backups should be allowed to age before deletion.

Thanks to the de-duplication and compression, there’s not a great economic incentive to not keep old backups around. It likely won’t cost you that much extra. But I like to keep things clean and minimal. If I haven’t used an online backup in 4 weeks, I generally consider it stale and have no further use for it.

To manage my Tarsnap backups, I wrote a Python script called Tarsnapper. The primary intent was to create a script that would automatically delete old archives. It does this by accepting a maximum age from the user. Whenever Tarsnapper runs, it gets a list of all Tarsnap archives. The timestamp is parsed out from the list and any archive that has a timestamp greater than the maximum allowed age is deleted. This is seamless, and means I never need to manually intervene to clean my archives.

Tarsnapper also provides some help for creating Tarsnap archives. It allows the user to define any number of named archives and the directories that those archives should contain. On my laptop I have four different directories that I backup with Tarsnap, three of them in one archive and the last in another archive. Tarsnapper knows about this, so whenever I want to backup to Tarsnap I just call a single command.

Tarsnapper also can automatically add a suffix to the end of each archive name. This makes it easier to know which archive is which when you are looking at a list. By default, the suffix is the current date and time.

Configuring Tarsnapper can be done either directly by changing the variables at the top of the script, or by creating a configuration file named tarsnapper.conf in your home directory. The config file on my laptop looks like this:

1
2
3
4
5
6
[Settings]
tarsnap: /usr/bin/tarsnap

[Archives]
nous-cloud: /home/pigmonkey/work /home/pigmonkey/documents /home/pigmonkey/vault/
nous-config: /home/pigmonkey/.config

There is also support for command-line arguments to specify the location of the configuration file to use, to delete old archives and exit without creating new archives, and to execute only a single named-archive rather than all of those that you may have defined.

$ tarsnapper.py --help
usage: tarsnapper.py [-h] [-c CONFIG] [-a ARCHIVE] [-r]

A Python script to manage Tarsnap archives.

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Specify the configuration file to use.
  -a ARCHIVE, --archive ARCHIVE
                        Specify a named archive to execute.
  -r, --remove          Remove archives old archives and exit.

It makes using a great service very simple. My backups can all be executed simply by a single call to Tarsnapper. Stale archives are deleted, saving me precious picodollars. I use this system on my laptop, as well as multiple servers. If you’re interested in it, Tarsnapper can be downloaded directly from GitHub. You can clone my entire backups repository if you’re also interested in the other scripts I’ve written to manage different aspects of backing up data.

Simple MySQL Backup/Restore

To backup:

$ mysqldump -u username -p -h hostname databasename > filename

And restore:

$ cat filename | mysql -u username -p -h hostname databasename

This post was published on . It was tagged with backups, linux.