You are currently viewing all posts tagged with annex.

I celebrated World Backup Day by increasing the resiliency of data in my life.

Four encrypted 2TB hard drives, stored in a Pelican 1200, with Abloy Protec2 PL 321 padlocks as tamper-evident seals. Having everything that matters stored in git-annex makes projects like this simple: just clone the repositories, define the preferred content expressions, and watch the magic happen.

Cold Storage

Antisocial Activity Tracking

A GPS track provides useful a useful log of physical activities. Beyond simply recording a route, the series of coordinate and time mappings allow statistics like distance, speed, elevation, and time to be calculated. I recently decided that I wanted to start recording this information, but I was not interested in any of the plethora of social, cloud-based services that are hip these days. A simple GPX track gives me all the information I care about, and I don’t have a strong desire to share them with a third party provider or a social network.

Recording Tracks

The discovery of GPSLogger is what made me excited to start this project. A simple but powerful Android application, GPSLogger will log to a number of different formats and, when a track is complete, automatically distribute it. This can be done by uploading the file to a storage provider, emailing it, or posting it to a custom URL. It always logs in metric units but optionally displays in Imperial.

What makes GPSLogger really stand out are its performance features. It allows very fine-grained control over GPS use, which allows tracks to be recorded for extended periods of times (such as days) with a negligible impact on battery usage.

For activities like running, shorter hikes and bicycle rides I tend to err on the side of accuracy. I set GPSLogger to log a coordinate every 10 seconds, with a minimum distance of 5 meters between points and a minimum accuracy of 10 meters. It will try to get a fix for 120 seconds before timing out, and attempt to meet the accuracy requirement for 60 seconds before giving up.

For a longer day-hike, the time between points could be increased to something in the neighborhood of 60 seconds. For a multi-day backpacking trip, a setting of 10 minutes or more would still provide great enough accuracy to make for a useful record of the route. I’ve found that being able to control these settings really opens up a lot of tracking possibilities that I would otherwise not consider for fear of battery drain.

GPSLogger

Storing Tracks

After a track has been recorded, I transfer it to my computer and store it with git-annex.

Everything in my home directory that is not a temporary file is stored either in git or git-annex. By keeping my tracks in an annex rather than directly in git, I can take advantage of git-annex’s powerful metadata support. GPSLogger automatically names tracks with a time stamp, but the annex for my tracks is also configured to automatically set the year and month when adding files.

$ cd ~/tracks
$ git config annex.genmetadata true

After moving a track into the annex, I’ll tag it with a custom activity field, with values like run, hike, or bike.

$ git annex metadata --set activity=bike 20150725110839.gpx

I also find it useful to tag tracks with a gross location value so that I can get an idea of where they were recorded without loading them on a map. Counties tend to work well for this.

$ git annex metadata --set county=sanfrancisco 20150725110839.gpx

Of course, a track may span multiple counties. This is easily handled by git-annex.

$ git annex metadata --set county+=marin 20150725110839.gpx

One could also use fields to store location values such as National Park, National Forest or Wilderness Area.

Metadata Views

The reason for storing metadata is the ability to use metadata driven views. This allows me to alter the directory structure of the annex based on the metadata. For instance, I can tell git-annex to show me all tracks grouped by year followed by activity.

$ git annex view "year=*" "activity=*"
$ tree -d
.
└── 2015
    ├── bike
    ├── hike
    └── run

Or, I could ask to see all the runs I went on this July.

$ git annex view year=2015 month=07 activity=run

I’ve found this to be a super powerful tool. It gives me the simplicity and flexibility of storing the tracks as plain-text on the filesystem, with some of the querying possibilities of a database. Its usefulness is only limited by the metadata stored.

Viewing Tracks

For simple statistics, I’ll use the gpxinfo command provided by gpxpy. This gives me the basics of time, distance and speed, which is generally all I care about for something like a weekly run.

$ gpxinfo 20150725110839.gpx
File: 20150725110839.gpx
    Length 2D: 6.081km
    Length 3D: 6.123km
    Moving time: 00:35:05
    Stopped time: n/a
    Max speed: 3.54m/s = 12.74km/h
    Total uphill: 96.50m
    Total downhill: 130.50m
    Started: 2015-07-25 18:08:45
    Ended: 2015-07-25 18:43:50
    Points: 188
    Avg distance between points: 32.35m

    Track #0, Segment #0
        Length 2D: 6.081km
        Length 3D: 6.123km
        Moving time: 00:35:05
        Stopped time: n/a
        Max speed: 3.54m/s = 12.74km/h
        Total uphill: 96.50m
        Total downhill: 130.50m
        Started: 2015-07-25 18:08:45
        Ended: 2015-07-25 18:43:50
        Points: 188
        Avg distance between points: 32.35m

For a more detailed inspection of the tracks, I opt for Viking. This allows me to load the tracks and view the route on a OpenStreetMap map (or any number of other map layers, such as USGS quads or Bing aerial photography). It includes all the detailed statistics you could care about extracting from a GPX track, including pretty charts of elevation, distance, time and speed.

If I want to view the track on my phone before I’ve transferred it to my computer, I’ll load it in either BackCountry Navigator or OsmAnd, depending on what kind of map layers I am interested in seeing. For simply viewing the statistics of a track on the phone, I go with GPS Visualizer (by the same author as GPSLogger).

Optical Backups of Photo Archives

I store my photos in git-annex. A full copy of the annex exists on my laptop and on an external drive. Encrypted copies of all of my photos are stored on Amazon S3 (which I pay for) and box.com (which provides 50GB for free) via git-annex special remotes. The photos are backed-up to an external drive daily with the rest of my laptop hard drive via backitup.sh and cryptshot. My entire laptop hard drive is also mirrored monthly to an external drive stored off-site.

(The majority of my photos are also on Flickr, but I don’t consider that a backup or even reliable storage.)

All of this is what I consider to be the bare minimum for any redundant data storage. Photos have special value, above the value that I assign to most other data. This value only increases with age. As such they require an additional backup method, but due to the size of my collection I want to avoid backup methods that involve paying for more online storage, such as Tarsnap.

I choose optical discs as the medium for my photo backups. This has the advantage of being read-only, which makes it more difficult for accidental deletions or corruption to propagate through the backup system. DVD-Rs have a capacity of 4.7 GBs and a cost of around $0.25 per disc. Their life expectancy varies, but 10-years seem to be a reasonable low estimate.

Preparation

I keep all of my photos in year-based directories. At the beginning of every year, the previous year’s directory is burned to a DVD.

Certain years contain few enough photos that the entire year can fit on a single DVD. More recent years have enough photos of a high enough resolution that they require multiple DVDs.

Archive

My first step is to build a compressed archive of each year. I choose tar and bzip2 compression for this because they’re simple and reliable.

1
2
$ cd ~/pictures
$ tar cjhf ~/tmp/pictures/2012.tar.bz 2012

If the archive is larger than 3.7 GB, it needs to be split into multiple files. The resulting files will be burned to different discs. The capacity of a DVD is 4.7 GB, but I place the upper file limit at 3.7 GB so that the DVD has a minimum of 20% of its capacity available. This will be filled with parity information later on for redundancy.

1
$ split -d -b 3700M 2012.tar.bz 2012.tar.bz.

Encrypt

Leaving unencrypted data around is bad form. The archive (or each of the files resulting from splitting the large archive) is next encrypted and signed with GnuPG.

1
2
$ gpg -eo 2012.tar.bz.gpg 2012.tar.bz
$ gpg -bo 2012.tar.bz.gpg.sig 2012.tar.bz.gpg

Imaging

The encrypted archive and the detached signature of the encrypted archive are what will be burned to the disc. (Or, in the case of a large archive, the encrypted splits of the full archive and the associated signatures will be burned to one disc per split/signature combonation.) Rather than burning them directly, an image is created first.

1
$ mkisofs -V "Photos: 2012 1/1" -r -o 2012.iso 2012.tar.bz.gpg 2012.tar.bz.gpg.sig

If the year has a split archive requiring multiple discs, I modify the sequence number in the volume label. For example, a year requiring 3 discs will have the label Photos: 2012 1/3.

Parity

When I began this project I knew that I wanted some sort of parity information for each disc so that I could potentially recover data from slightly damaged media. My initial idea was to use parchive via par2cmdline. Further research led me to dvdisaster which, despite being a GUI-only program, seemed more appropriate for this use case.

Both dvdisaster and parchive use the same Reed–Solomon error correction codes. Dvdidaster is aimed at optical media and has the ability to place the error correction data on the disc by augmenting the disc image, as well as storing the data separately. It can also scan media for errors and assist in judging when the media is in danger of becoming defective. This makes it an attractive option for long-term storage.

I use dvdisaster with the RS02 error correction method, which augments the image before burning. Depending on the size of the original image, this will result in the disc having anywhere from 20% to 200% redundancy.

Verify

After the image has been augmented, I mount it and verify the signature of the encrypted file on the disc against the local copy of the signature. I’ve never had the signatures not match, but performing this step makes me feel better.

1
2
3
$ sudo mount -o loop 2012.iso /mnt/disc
$ gpg --verify 2012.tar.bz.gpg.sig /mnt/disc/2012.tar.bz.gpg
$ sudo umount /mnt/disc

Burn

The final step is to burn the augmented image. I always burn discs at low speeds to diminish the chance of errors during the process.

1
$ cdrecord -v speed=4 dev=/dev/sr0 2012.iso

Similar to the optical backups of my password database, I burn two copies of each disc. One copy is stored off-site. This provides a reasonably level of assurance against any loss of my photos.

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Terence Eden points out that censorship becomes more difficult as flash memory devices become smaller and gain greater capacity. Case in point: Director Jafar Panahi smuggled This Is Not a Film out of Iran on a flash-drive hidden in a cake. For me, the practicality of the sneakernet became revitalized after I began using git-annex earlier this year.