I store my photos in git-annex. A full copy of the annex exists on my laptop and on an external drive. Encrypted copies of all of my photos are stored on Amazon S3 (which I pay for) and box.com (which provides 50GB for free) via git-annex special remotes. The photos are backed-up to an external drive daily with the rest of my laptop hard drive via backitup.sh and cryptshot. My entire laptop hard drive is also mirrored monthly to an external drive stored off-site.
(The majority of my photos are also on Flickr, but I don’t consider that a backup or even reliable storage.)
All of this is what I consider to be the bare minimum for any redundant data storage. Photos have special value, above the value that I assign to most other data. This value only increases with age. As such they require an additional backup method, but due to the size of my collection I want to avoid backup methods that involve paying for more online storage, such as Tarsnap.
I choose optical discs as the medium for my photo backups. This has the advantage of being read-only, which makes it more difficult for accidental deletions or corruption to propagate through the backup system. DVD-Rs have a capacity of 4.7 GBs and a cost of around $0.25 per disc. Their life expectancy varies, but 10-years seem to be a reasonable low estimate.
I keep all of my photos in year-based directories. At the beginning of every year, the previous year’s directory is burned to a DVD.
Certain years contain few enough photos that the entire year can fit on a single DVD. More recent years have enough photos of a high enough resolution that they require multiple DVDs.
If the archive is larger than 3.7 GB, it needs to be split into multiple files. The resulting files will be burned to different discs. The capacity of a DVD is 4.7 GB, but I place the upper file limit at 3.7 GB so that the DVD has a minimum of 20% of its capacity available. This will be filled with parity information later on for redundancy.
The encrypted archive and the detached signature of the encrypted archive are what will be burned to the disc. (Or, in the case of a large archive, the encrypted splits of the full archive and the associated signatures will be burned to one disc per split/signature combonation.) Rather than burning them directly, an image is created first.
If the year has a split archive requiring multiple discs, I modify the sequence number in the volume label. For example, a year requiring 3 discs will have the label
Photos: 2012 1/3.
When I began this project I knew that I wanted some sort of parity information for each disc so that I could potentially recover data from slightly damaged media. My initial idea was to use parchive via par2cmdline. Further research led me to dvdisaster which, despite being a GUI-only program, seemed more appropriate for this use case.
Both dvdisaster and parchive use the same Reed–Solomon error correction codes. Dvdidaster is aimed at optical media and has the ability to place the error correction data on the disc by augmenting the disc image, as well as storing the data separately. It can also scan media for errors and assist in judging when the media is in danger of becoming defective. This makes it an attractive option for long-term storage.
I use dvdisaster with the RS02 error correction method, which augments the image before burning. Depending on the size of the original image, this will result in the disc having anywhere from 20% to 200% redundancy.
After the image has been augmented, I mount it and verify the signature of the encrypted file on the disc against the local copy of the signature. I’ve never had the signatures not match, but performing this step makes me feel better.
1 2 3
The final step is to burn the augmented image. I always burn discs at low speeds to diminish the chance of errors during the process.
Similar to the optical backups of my password database, I burn two copies of each disc. One copy is stored off-site. This provides a reasonably level of assurance against any loss of my photos.