I’ve carried the same YubiKey NEO on my keychain for five years. On average it gets used dozens of times per day, via USB, as an OpenPGP card. The YubiKey looks a little worse for wear, but it almost always works flawlessly.
Occasionally, it requires a few insertions to be read. When this happens I clean the contacts by rubbing them gently with a Pentel Clic Eraser, wiping off the dust, spraying them with isopropyl alcohol, and then wiping them dry. Afterwards, the YubiKey is registered immediately on the first insert. I perform this procedure about once or twice per year.
Using the eraser is potentially dangerous, but I’ve had good luck with it over the years. The white vinyl in the Pentel Clic feels very smooth compared to the abrasiveness of the rubber found on the tops of most pencils.
Every year I burn an optical archive of my financial documents, similar to how (and why) I create optical backups of photos. I schedule this financial archive for the spring, after the previous year’s taxes have been submitted and accepted. Taskwarrior solves the problem of remembering to complete the archive.
The first is my ledger repository. Ledger is the double-entry accounting system I began using in 2012 to record the movement of every penny that crosses one of my bank accounts (small cash transactions, less than about $20, are usually-but-not-always except from being recorded). In addition to the plain-text ledger files, this repository also holds PDF or JPG images of receipts.
The second repository holds my tax information. Each tax year gets a ctmg container which contains any documents used to complete my tax returns, the returns themselves, and any notifications of those returns being accepted.
The yearly optical archive that I create holds the entirety of these two repositories – not just the information from the previous year – so really each disc only needs to have a shelf life of 12 months. Keeping the older discs around just provides redundancy for prior years.
Creating the Archive
The process of creating the archive is very similar to the process I outlined six years ago for the photo archives.
The two repositories, combined, are about 2GB (most of that is the directory of receipts from the ledger repository). I burn these to a 25GB BD-R disc, so file size is not a concern. I’ll tar them, but skip any compression, which would just add extra complexity for no gain.
$ mkdir ~/tmp/archive
$ cd ~/library
$ tar cvf ~/tmp/archive/ledger.tar ledger
$ tar cvf ~/tmp/archive/tax.tar tax
The ledger archive will get signed and encrypted with my PGP key. The contents of the tax repository are already encrypted, so I’ll skip encryption and just sign the archive. I like using detached signatures for this.
Previously, when creating optical photo archives, I used DVDisaster to create the disc image with parity. DVDisaster no longer exists. The code can still be found, and the program still works, but nobody is developing it and it doesn’t even an official web presence. This makes me uncomfortable for a tool that is part of my long-term archiving plans. As a result, I’ve moved back to using Parchive for parity. Parchive also does not have much in the way of active development around it, but it is still maintained, has been around for a long period of time, is still used by a wide community, and will probably continue to exist as long as people share files on less-than-perfectly-reliable mediums.
As previously mentioned, I’m not worried about the storage space for these files, so I tell par2create to create PAR2 files with 30% redundancy. I suppose I could go even higher, but 30% seems like a good number. By default this process will be allowed to use 16MB of memory, which is cute, but RAM is cheap and I usually have enough to spare so I’ll give it permission to use up to 8GB.
$ par2create -r30 -m8000 recovery.par2 *
Next I’ll use hashdeep to generate message digests for all the files in the archive.
$ hashdeep * > hashes
At this point all the file processing is completed. I’ll put a blank disc in my burner (a Pioneer BDR-XD05B) and burn the directory using growisofs.
$ growisofs -Z /dev/sr0 -V "Finances 2019" -r *
The final step is to verify the disc. I have a few options on this front. These are the same steps I’d take years down the road if I actually needed to recover data from the archive.
I can use the previous hashes to find any files that do not match, which is a quick way to identify bit rot.
Last year I demonstrated setting up the USB Armory for PGP key management. The two management operations I perform on the Armory are key signing and key renewal. I set my keys to expire each year, so that each year I need to confirm that I am not dead, still control the keys, and still consider them trustworthy.
After booting up the Armory, I first verify that NTP is disabled and set the current UTC date and time. Time is critical for any cryptography operations, and the Armory has no battery to maintain a clock.
I’d neglected backup LUKS headers until Gwern’s data loss postmortem last year. After reading his post I dumped the headers of the drives I had accessible, but I never got around to performing the task on my less frequently accessed drives. Last month I had trouble mounting one of those drives. It turned out I was simply using the wrong passphrase, but the experience prompted me to make sure I had completed the header backup procedure for all drives.
I dump the header to memory using the procedure from the Arch wiki. This is probably unnecessary, but only takes a few extra steps. The header is stored in my password store, which is obsessively backed up.
For years the core of my backup strategy has been rsnapshot via cryptshot to various external drives for local backups, and Tarsnap for remote backups.
Tarsnap, however, can be slow. It tends to take somewhere between 15 to 20 minutes to create my dozen or so archives, even if little has changed since the last run. My impression is that this is simply due to the number of archives I have stored and the number of files I ask it to archive. Once it has decided what to do, the time spent transferring data is negligible. I run Tarsnap hourly. Twenty minutes out of every hour seems like a lot of time spent Tarsnapping.
I’ve eyed Borg for a while (and before that, Attic), but avoided using it due to the rapid development of its earlier days. While activity is nice, too many changes too close together do not create a reassuring image of a backup project. Borg seems to have stabilized now and has a large enough user base that I feel comfortable with it. About a month ago, I began using it to backup my laptop to rsync.net.
Initially I played with borgmatic to perform and maintain the backups. Unfortunately it seems to have issues with signal handling, which caused me to end up with annoying lock files left over from interrupted backups. Borg itself has good documentation and is easy to use, and I think it is useful to build familiarity with the program itself instead of only interacting with it through something else. So I did away with borgmatic and wrote a small bash script to handle my use case.
Creating the backups is simple enough. Borg disables compression by default, but after a little experimentation I found that LZ4 seemed to be a decent compromise between compression and performance.
Pruning backups is equally easy. I knew I wanted to match roughly what I had with Tarsnap: hourly backups for a day or so, daily backups for a week or so, then a month or two of weekly backups, and finally a year or so of monthly backups.
My only hesitation was in how to maintain the health of the backups. Borg provides the convenient borg check command, which is able to verify the consistency of both a repository and the archives themselves. Unsurprisingly, this is a slow process. I didn’t want to run it with my hourly backups. Daily, or perhaps even weekly, seemed more reasonable, but I did want to make sure that both checks were completed successfully with some frequency. Luckily this is just the problem that I wrote backitup to solve.
Because the consistency checks take a while and consume some resources, I thought it would also be a good idea to avoid performing them when I’m running on battery. Giving backitup the ability to detect if the machine is on battery or AC power was a simple hack. The script now features the -a switch to specify that the program should only be executed when on AC power.
I’ve been running this for about a month now and have been pleased with the results. It averages about 30 seconds to create the backups every hour, and another 30 seconds or so to prune the old ones. As with Tarsnap, deduplication is great.
The most recent repository consistency check took about 30 minutes, but only runs every 172800 seconds, or once every other day. The most recent archive consistency check took about 40 minutes, but only runs every 259200 seconds, or once per 3 days. I’m not sure that those schedules are the best option for the consistency checks. I may tweak their frequencies, but because I know they will only be executed when I am on a trusted network and AC power, I’m less concerned about the length of time.
With Borg running hourly, I’ve reduced Tarsnap to run only once per day. Time will tell if Borg will slow as the number of stored archives increase, but for now running Borg hourly and Tarsnap daily seems like a great setup. Tarsnap and Borg both target the same files (with a few exceptions). Tarsnap runs in the AWS us-east-1 region. I’ve always kept my rsync.net account in their Zurich datacenter. This provides the kind of redundancy that lets me rest easy.
Contrary to what you might expect given the number of blog posts on the subject, I actually spend close to no time worrying about data loss in my day to day life, thanks to stuff like this. An ounce of prevention, and all that. (Maybe a few kilograms of prevention in my case.)
I use a Yubikey Neo for day-to-day PGP operations. For managing the secret key itself, such as during renewal or key signing, I use a USB Armory with host adapter. In host mode, the Armory provides a trusted, open source platform that is compact and easily secured, making it ideal for key management.
Setting up the Armory is fairly straightforward. The Arch Linux ARM project provides prebuilt images. From my laptop, I follow their instructions to prepare the micro SD card, where /dev/sdX is the SD card.
$ddif=/dev/zeroof=/dev/sdXbs=1Mcount=8$fdisk/dev/sdX# `o` to clear any partitions# `n`, `p`, `1`, `2048`, `enter` to create a new primary partition in the first position with a first sector of 2048 and the default last sector# `w` to write$mkfs.ext4/dev/sdX1$mkdir/mnt/sdcard$mount/dev/sdX1/mnt/sdcard
And then extract the image, doing whatever verification is necessary after downloading.
The SD card can then be inserted into the Armory. At no time during this process – or at any point in the future – is the Armory connected to a network. It is entirely air-gapped. As long as the image was not compromised and the Armory is stored securely, the platform should remain trusted.
Note that because the Armory is never on a network, and it has no internal battery, it does not keep time. Upon first boot, NTP should be disabled and the time and date set.
$ timedatectl set-ntp false
$ timedatectl set-time "yyyy-mm-dd hh:mm:ss"# UTC
On subsequent boots, the time and date should be set with timedatectl set-time before performing any cryptographic operations.
I’ve been happy with the Seagate ST2000LM003 drives for this application. Unfortunately the enclosures I first purchased did not work out so well. I had two die within a few weeks. They’ve been replaced with the SIG JU-SA0Q12-S1. These claim to be compatible with drives up to 8TB (someday I’ll be able to buy 8TB 2.5” drives) and support USB 3.1. They’re also a bit thinner than the previous enclosures, so I can easily fit five in my box. The Seagate drives offer about 1.7 terabytes of usable space, giving this setup a total capacity of 8.5 terabytes.
Setting up git-annex to support this type of cold storage is fairly straightforward, but does necessitate some familiarity with how the program works. Personally, I prefer to do all my setup manually. I’m happy to let the assistant watch my repositories and manage them after the setup, and I’ll occasionally fire up the web app to see what the assistant daemon is doing, but I like the control and understanding provided by a manual setup. The power and flexibility of git-annex is deceptive. Using it solely through the simplified interface of the web app greatly limits what can be accomplished with it.
Before even getting into git-annex, the drive should be encrypted with LUKS/dm-crypt. The need for this could be avoided by using something like gcrypt, but LUKS/dm-crypt is an ingrained habit and part of my workflow for all external drives. Assuming the drive is /dev/sdc, pass cryptsetup some sane defaults:
At this point the drive is ready. I close it and then mount it with udiskie to make sure everything is working. How the drive is mounted doesn’t matter, but I like udiskie because it can integrate with my password manager to get the drive passphrase.
With the encryption handled, the drive should now be mounted at /media/themisto. For the first few steps, we’ll basically follow the git-annex walkthrough. Let’s assume that we are setting up this drive to be a repository of the annex ~/video. The first step is to go to the drive, clone the repository, and initialize the annex. When initializing the annex I prepend the name of the remote with satellite :. My cold storage drives are all named after satellites, and doing this allows me to easily identify them when looking at a list of remotes.
$ cd /media/themisto
$ git clone ~/video
$ cd video
$ git annex init "satellite : themisto"
Whenever dealing with a repository that is bigger (or may become bigger) than the drive it is being stored on, it is important to set a disk reserve. This tells git-annex to always keep some free space around. I generally like to set this to 1 GB, which is way larger than it needs to be.
$ git config annex.diskreserve "1 gb"
I’ll then tell this new repository where the original repository is located. In this case I’ll refer to the original using the name of my computer, nous.
$ git remote add nous ~/video
If other remotes already exist, now is a good time to add them. These could be special remotes or normal ones. For this example, let’s say that we have already completed this whole process for another cold storage drive called sinope, and that we have an s3 remote creatively named s3.
Trust is a critical component of how git-annex works. Any new annex will default to being semi-trusted, which means that when running operations within the annex on the main computer – say, dropping a file – git-annex will want to confirm that themisto has the files that it is supposed to have. In the case of themisto being a USB drive that is rarely connected, this is not very useful. I tell git-annex to trust my cold storage drives, which means that if git-annex has a record of a certain file being on the drive, it will be satisfied with that. This increases the risk for potential data-loss, but for this application I feel it is appropriate.
$ git annex trust .
The final step that needs to be taken on the new repository is to tell it what files it should want. This is done using preferred content. The standard groups that git-annex ships with cover most of the bases. Of interest for this application is the archive group, which wants all content except that which has already found its way to another archive. This is the behaviour I want, but I will duplicate it into a custom group called satellite. This keeps my cold storage drives as standalone things that do not influence any other remotes where I may want to use the default archive.
For other repositories, I may want to store the data on multiple cold storage drives. In that case I would create a redundantsatellite group that wants all content which is not already present in two other members of the group.
Despite its shortcomings, I think PGP is still one of the better ways to verify a person’s identity. Because of this – and because I use my PGP key daily1 – I make an effort to properly secure my private key. Verifying a PGP key is a fairly straightforward process for fellow PGP users, and my hope is that anyone who does verify my key can maintain a high confidence in its signature.
However, I also use other cryptographic channels to communicate – XMPP/OTR and Signal chief among them. I consider these keys more transient than PGP. The OTR keys on my computer are backed up because it takes no effort to do so, but I have no qualms about creating new ones if I feel like it. I don’t bother to port the same keys to other devices, like my phone. My Signal key is guaranteed to change anytime I rebuild or replace my phone. Given the nature of these keys and how I handle them, I don’t expect others to put the same amount of effort into verifying their fingerprints.
The solution to this is to maintain a simple text file, signed via PGP, containing the fingerprints of my other keys. With a copy of the file and a trusted copy of my public PGP key, anyone can verify my identity on other networks or communication channels. If a key is replaced, I simply add the new fingerprint to the file, sign it and distribute. Contacts download the file, check its signature, and thus easily trust the new fingerprint without additional rigmarole.
The first examples of this that I saw were from Yan and Tom Lowenthal. I thought it seemed like a great idea and began to maintain a file with a list of examples whenever I stumbled across then, with a note that I should do that someday2.
Today I decided to stop procrastinating on this and create my own identity file. It is located at pig-monkey.com/id.txt. The file, along with the rest of this website, is in git so that changes to it may be tracked over time.
Inspired by some of the examples I had collected, I added a couple pieces of related information to the file. The section on PGP key signing should provide others some context for what it means when they see my signature on a different key. Even if no one cares, I found it useful to enunciate the policy simply to clear up my own thinking about what the different certification levels should mean. Finally, the section on key management gives others a rough idea about how I manage my key, which should help them to maintain their confidence in it. If I verify that someone’s identity and fingerprint match their key, I will have high confidence in its signature initially. But if I know that the person keeps their secret key on their daily driver machine without any additional effort to protect it, my confidence in it will degrade over time. Less so if I know that they take great care and handling in their key’s protection.
A file like this should also provide a good mechanism for creating a transition and revocation statement for my PGP key, should the need arise. One hopes that it does not.
↵ Realistically, I use PGP multiple times per hour when I'm on my computer.
↵ Since I began my list, Keybase has become a thing. It addresses a similar problem, although seems to promote using services like Twitter as the root of trust. Assuming that you want to stubbornly stick with a PGP key as the root of trust, I don't see the advantage of using Keybase for this problem, except that it offers a centralized lookup repository.