dictd is a dictionary database server and client. It can be used to lookup word definitions over a network. I don’t use it for that. I use the program to provide an offline dictionary. Depending on a network connection, web browser and third-party websites just to define a word strikes me as dumb.
Every year I burn an optical archive of my financial documents, similar to how (and why) I create optical backups of photos. I schedule this financial archive for the spring, after the previous year’s taxes have been submitted and accepted. Taskwarrior solves the problem of remembering to complete the archive.
The first is my ledger repository. Ledger is the double-entry accounting system I began using in 2012 to record the movement of every penny that crosses one of my bank accounts (small cash transactions, less than about $20, are usually-but-not-always except from being recorded). In addition to the plain-text ledger files, this repository also holds PDF or JPG images of receipts.
The second repository holds my tax information. Each tax year gets a ctmg container which contains any documents used to complete my tax returns, the returns themselves, and any notifications of those returns being accepted.
The yearly optical archive that I create holds the entirety of these two repositories – not just the information from the previous year – so really each disc only needs to have a shelf life of 12 months. Keeping the older discs around just provides redundancy for prior years.
Creating the Archive
The process of creating the archive is very similar to the process I outlined six years ago for the photo archives.
The two repositories, combined, are about 2GB (most of that is the directory of receipts from the ledger repository). I burn these to a 25GB BD-R disc, so file size is not a concern. I’ll tar them, but skip any compression, which would just add extra complexity for no gain.
$ mkdir ~/tmp/archive
$ cd ~/library
$ tar cvf ~/tmp/archive/ledger.tar ledger
$ tar cvf ~/tmp/archive/tax.tar tax
The ledger archive will get signed and encrypted with my PGP key. The contents of the tax repository are already encrypted, so I’ll skip encryption and just sign the archive. I like using detached signatures for this.
Previously, when creating optical photo archives, I used DVDisaster to create the disc image with parity. DVDisaster no longer exists. The code can still be found, and the program still works, but nobody is developing it and it doesn’t even an official web presence. This makes me uncomfortable for a tool that is part of my long-term archiving plans. As a result, I’ve moved back to using Parchive for parity. Parchive also does not have much in the way of active development around it, but it is still maintained, has been around for a long period of time, is still used by a wide community, and will probably continue to exist as long as people share files on less-than-perfectly-reliable mediums.
As previously mentioned, I’m not worried about the storage space for these files, so I tell par2create to create PAR2 files with 30% redundancy. I suppose I could go even higher, but 30% seems like a good number. By default this process will be allowed to use 16MB of memory, which is cute, but RAM is cheap and I usually have enough to spare so I’ll give it permission to use up to 8GB.
$ par2create -r30 -m8000 recovery.par2 *
Next I’ll use hashdeep to generate message digests for all the files in the archive.
$ hashdeep * > hashes
At this point all the file processing is completed. I’ll put a blank disc in my burner (a Pioneer BDR-XD05B) and burn the directory using growisofs.
$ growisofs -Z /dev/sr0 -V "Finances 2019" -r *
Verification
The final step is to verify the disc. I have a few options on this front. These are the same steps I’d take years down the road if I actually needed to recover data from the archive.
I can use the previous hashes to find any files that do not match, which is a quick way to identify bit rot.
GOESImage is a bash script which downloads the latest imagery from the NOAA Geostationary Operational Environment Satellites and sets it as the desktop background via feh. If you don’t use feh, it should be easy to plug GOESImage into any desktop background control program.
I wrote GOESImage after using himawaripy for a few years, which is a program that provides imagery of the Asia-Pacific region from the Himawari 8 Japanese weather satellite. I like seeing the Earth, and I’ve found that real time imagery of my location is actually useful for identifying the approach of large-scale weather systems. NOAA’s nighttime multispectral infrared coloring is pretty neat, too.
I signed-up for Pinboard in 2014. It provides everything I need from a bookmarking service, which is mostly, you know, bookmarking. I pay for the archival account, meaning that Pinboard downloads a copy of everything I bookmark and provides me with full-text search. I find this useful and well worth the $25 yearly fee, but Pinboard’s archive is only part of the solution. I also need an offline copy of my bookmarks.
Pinboard provides an API that makes it easy to acquire a list of bookmarks. I have a small shell script which pulls down a JSON-formatted list of my bookmarks and adds the file to git-annex. This is controlled via a systemd service and timer, which wraps the script in backitup to ensure daily dumps. The systemd timer itself is controlled by nmtrust, so that it only runs when I am connected to a trusted network.
This provides data portability, ensuring that I could import my tagged URLs to another bookmarking service if I ever found something better than Pinboard (unlikely, competing with Pinboard is futile). But I also want a locally archived copy of the pages themselves, which Pinboard does not offer through the API. I carry very much about being able to work offline. The usefulness of a computer is directly propertional to the amount of data that is accessible without a network connection.
To address this I use bookmark-archiver, a Python script which reads URLs from a variety of input files, including Pinboard’s JSON dumps. It archives each URL via wget, generates a screenshot and PDF via headless Chromium, and submits the URL to the Internet Archive (with WARC hopefully on the way). It will then generate an HTML index page, allowing the archives to be easily browsed. When I want to browse the archive, I simply change into the directory and use python -m http.server to serve the bookmarks at localhost:8000. Once downloaded locally, the archives are of course backed up, via the usual suspects like borg and cryptshot.
The archiver is configured via environment variables. I configure my preferences and point the program at the Pinboard JSON dump in my annex via a shell script (creatively also named bookmark-archiver). This wrapper script is called by the previous script which dumps the JSON from Pinboard.
The result of all of this is that every day I get a fresh dump of all my bookmarks, each URL is archived locally in multiple formats, and the archive enters into my normal backup queue. Link rot may defeat the Supreme Court, but between this and my automated repository tracking I have a pretty good system for backing up useful pieces of other people’s data.
Last year I demonstrated setting up the USB Armory for PGP key management. The two management operations I perform on the Armory are key signing and key renewal. I set my keys to expire each year, so that each year I need to confirm that I am not dead, still control the keys, and still consider them trustworthy.
After booting up the Armory, I first verify that NTP is disabled and set the current UTC date and time. Time is critical for any cryptography operations, and the Armory has no battery to maintain a clock.
I perform each renewal in a directory specific to the current year, but the GNUPGHOME directory I set for this year’s renewal doesn’t exist yet. Better create it.
$ mkdir -p $GNUPGHOME
$ chmod 700$GNUPGHOME
I keep a copy of my gpg.conf on the microSD card. That needs to be copied in to the new directory, and I’ll need to tell GnuPG what pinentry program to use.
When performing the actual renewal, I’ll set the expiration to 13 months. This needs to be done for the master key, the signing subkey, the encryption subkey, and the authentication subkey.
$ gpg --edit-key $KEYID
trust
5
expire
13m
y
key 1
key 2
key 3
expire
y
13m
y
save
That’s the renewal. I’ll list the keys to make sure they look as expected.
$ gpg --list-keys
Before moving the subkeys to my Yubikey, I back everything up. This will be what I import the following year.
When I list the secret keys, I expect them to all be stubs (showing as ssb>).
$ gpg --list-secret-keys
Of course, for this to be useful I need to export my renewed public key and copy it to some place where it can be brought to a networked machine for dissemination.
I’d neglected backup LUKS headers until Gwern’s data loss postmortem last year. After reading his post I dumped the headers of the drives I had accessible, but I never got around to performing the task on my less frequently accessed drives. Last month I had trouble mounting one of those drives. It turned out I was simply using the wrong passphrase, but the experience prompted me to make sure I had completed the header backup procedure for all drives.
I dump the header to memory using the procedure from the Arch wiki. This is probably unnecessary, but only takes a few extra steps. The header is stored in my password store, which is obsessively backed up.
My experience with all task managements systems – whether software or otherwise – is that the more you put into them, the more useful they become. Not only adding as many tasks as possible, however small they may be, but also enriching the tasks with as much metadata as possible. When I began using taskwarrior, one of the problems I encountered was how to address this effectively.
Throughout the day I’ll be working on something when I receive an unrelated request. I want to log those requests so that I remember them and eventually complete them, but I don’t want to break from whatever I’m currently doing and take the time to mark these tasks up with the full metadata they eventually need. Context switching is expensive.
To address this, I’ve introduced the idea of a task inbox. I have an alias to add a task to taskwarrior with a due date of tomorrow and a tag of inbox.
alias ti='task add due:tomorrow tag:inbox'
This allows me to very quickly add a task without needing to think about it.
$ ti do something important
Each morning I run task ls. The tasks which were previously added to my inbox are at the top, overdue with a high priority. At this point I’ll modify each of them, removing the inbox tag, setting a real due date, and assigning them to a project. If they are more complex, I may also add annotations or notes, or build the task out with dependencies. If the task is simple – something that may only take a minute or two – I’ll just complete it immediately and mark it as done without bothering to remove the inbox tag.
This alias lowers the barrier of entry enough that I am likely to log even the smallest of tasks, while the inbox concept provides a framework for me to later make the tasks rich in a way that allows me to take advantage of the power that taskwarrior provides.
For years the core of my backup strategy has been rsnapshot via cryptshot to various external drives for local backups, and Tarsnap for remote backups.
Tarsnap, however, can be slow. It tends to take somewhere between 15 to 20 minutes to create my dozen or so archives, even if little has changed since the last run. My impression is that this is simply due to the number of archives I have stored and the number of files I ask it to archive. Once it has decided what to do, the time spent transferring data is negligible. I run Tarsnap hourly. Twenty minutes out of every hour seems like a lot of time spent Tarsnapping.
I’ve eyed Borg for a while (and before that, Attic), but avoided using it due to the rapid development of its earlier days. While activity is nice, too many changes too close together do not create a reassuring image of a backup project. Borg seems to have stabilized now and has a large enough user base that I feel comfortable with it. About a month ago, I began using it to backup my laptop to rsync.net.
Initially I played with borgmatic to perform and maintain the backups. Unfortunately it seems to have issues with signal handling, which caused me to end up with annoying lock files left over from interrupted backups. Borg itself has good documentation and is easy to use, and I think it is useful to build familiarity with the program itself instead of only interacting with it through something else. So I did away with borgmatic and wrote a small bash script to handle my use case.
Creating the backups is simple enough. Borg disables compression by default, but after a little experimentation I found that LZ4 seemed to be a decent compromise between compression and performance.
Pruning backups is equally easy. I knew I wanted to match roughly what I had with Tarsnap: hourly backups for a day or so, daily backups for a week or so, then a month or two of weekly backups, and finally a year or so of monthly backups.
My only hesitation was in how to maintain the health of the backups. Borg provides the convenient borg check command, which is able to verify the consistency of both a repository and the archives themselves. Unsurprisingly, this is a slow process. I didn’t want to run it with my hourly backups. Daily, or perhaps even weekly, seemed more reasonable, but I did want to make sure that both checks were completed successfully with some frequency. Luckily this is just the problem that I wrote backitup to solve.
Because the consistency checks take a while and consume some resources, I thought it would also be a good idea to avoid performing them when I’m running on battery. Giving backitup the ability to detect if the machine is on battery or AC power was a simple hack. The script now features the -a switch to specify that the program should only be executed when on AC power.
I’ve been running this for about a month now and have been pleased with the results. It averages about 30 seconds to create the backups every hour, and another 30 seconds or so to prune the old ones. As with Tarsnap, deduplication is great.
The most recent repository consistency check took about 30 minutes, but only runs every 172800 seconds, or once every other day. The most recent archive consistency check took about 40 minutes, but only runs every 259200 seconds, or once per 3 days. I’m not sure that those schedules are the best option for the consistency checks. I may tweak their frequencies, but because I know they will only be executed when I am on a trusted network and AC power, I’m less concerned about the length of time.
With Borg running hourly, I’ve reduced Tarsnap to run only once per day. Time will tell if Borg will slow as the number of stored archives increase, but for now running Borg hourly and Tarsnap daily seems like a great setup. Tarsnap and Borg both target the same files (with a few exceptions). Tarsnap runs in the AWS us-east-1 region. I’ve always kept my rsync.net account in their Zurich datacenter. This provides the kind of redundancy that lets me rest easy.
Contrary to what you might expect given the number of blog posts on the subject, I actually spend close to no time worrying about data loss in my day to day life, thanks to stuff like this. An ounce of prevention, and all that. (Maybe a few kilograms of prevention in my case.)