pig-monkey.com - backupshttps://pig-monkey.com/2024-07-05T08:05:13-07:00Wherein the Author Learns to Compact Borg Archives2024-07-05T00:00:00-07:002024-07-05T08:05:13-07:00Pig Monkeytag:pig-monkey.com,2024-07-05:/2024/07/borg-compact/<p>I noticed that my <a href="https://www.borgbackup.org/">Borg</a> directory on <a href="https://www.rsync.net/products/borg.html">The Cloud</a> was 239 GB. This struck me as problematic, as I could see in my local logs that Borg itself reported the deduplicated size of all archives to be 86 GB.</p>
<p>A web search revealed <a href="https://borgbackup.readthedocs.io/en/stable/usage/compact.html"><code>borg compact</code></a>, which apparently I have been …</p><p>I noticed that my <a href="https://www.borgbackup.org/">Borg</a> directory on <a href="https://www.rsync.net/products/borg.html">The Cloud</a> was 239 GB. This struck me as problematic, as I could see in my local logs that Borg itself reported the deduplicated size of all archives to be 86 GB.</p>
<p>A web search revealed <a href="https://borgbackup.readthedocs.io/en/stable/usage/compact.html"><code>borg compact</code></a>, which apparently I have been meant to run manually <a href="https://borgbackup.readthedocs.io/en/stable/changes.html#version-1-2-0a2-and-earlier-2019-02-24">since 2019</a>. Oops. After compacting, the directory dropped from 239 GB to 81 GB.</p>
<p>My borg wrapper script now looks like this:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span>
<span class="normal">36</span>
<span class="normal">37</span>
<span class="normal">38</span>
<span class="normal">39</span>
<span class="normal">40</span>
<span class="normal">41</span>
<span class="normal">42</span>
<span class="normal">43</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/bin/sh</span>
<span class="nb">source</span> ~/.keys/borg.sh
<span class="nb">export</span> <span class="nv">BORG_REPO</span><span class="o">=</span><span class="s1">'borg-rsync:borg/nous'</span>
<span class="nb">export</span> <span class="nv">BORG_REMOTE_PATH</span><span class="o">=</span><span class="s1">'borg1'</span>
<span class="c1"># Create backups</span>
<span class="nb">echo</span> <span class="s2">"Creating backups..."</span>
borg create --verbose --stats --compression<span class="o">=</span>lz4 <span class="se">\</span>
--exclude ~/projects/foo/bar/baz <span class="se">\</span>
--exclude ~/projects/xyz/bigfatbinaries <span class="se">\</span>
::<span class="s1">'{hostname}-{user}-{utcnow:%Y-%m-%dT%H:%M:%S}'</span> <span class="se">\</span>
~/documents <span class="se">\</span>
~/projects <span class="se">\</span>
~/mail <span class="se">\</span>
<span class="c1"># ...etc</span>
<span class="c1"># Prune</span>
<span class="nb">echo</span> <span class="s2">"Pruning backups..."</span>
borg prune --verbose --list --glob-archives <span class="s1">'{hostname}-{user}-*'</span> <span class="se">\</span>
--keep-within<span class="o">=</span>1d <span class="se">\</span>
--keep-daily<span class="o">=</span><span class="m">14</span> <span class="se">\</span>
--keep-weekly<span class="o">=</span><span class="m">8</span> <span class="se">\</span>
--keep-monthly<span class="o">=</span><span class="m">12</span> <span class="se">\</span>
<span class="c1"># Compact</span>
<span class="nb">echo</span> <span class="s2">"Compacting repository..."</span>
backitup <span class="se">\</span>
-p <span class="m">604800</span> <span class="se">\</span>
-l ~/.borg_compact-repo.lastrun <span class="se">\</span>
-b <span class="s2">"borg compact --verbose"</span> <span class="se">\</span>
<span class="c1"># Check</span>
<span class="nb">echo</span> <span class="s2">"Checking repository..."</span>
backitup -a <span class="se">\</span>
-p <span class="m">172800</span> <span class="se">\</span>
-l ~/.borg_check-repo.lastrun <span class="se">\</span>
-b <span class="s2">"borg check --verbose --repository-only --max-duration=1200"</span> <span class="se">\</span>
<span class="nb">echo</span> <span class="s2">"Checking archives..."</span>
backitup -a <span class="se">\</span>
-p <span class="m">259200</span> <span class="se">\</span>
-l ~/.borg_check-arch.lastrun <span class="se">\</span>
-b <span class="s2">"borg check --verbose --archives-only --last 18"</span> <span class="se">\</span>
</code></pre></div></td></tr></table></div>
<p>Other than the addition of a weekly <code>compact</code>, my setup is the <a href="/2017/07/borg/">same as it ever was</a>.</p>I published my script for creating optical backups.2020-04-23T00:00:00-07:002020-04-23T17:45:01-07:00Pig Monkeytag:pig-monkey.com,2020-04-23:/2020/04/optician/<p><a href="https://github.com/pigmonkey/optician">Optician</a> archives a directory, optionally encrypts it, records the integrity of all the things, and burns it to disc. I created it last year after writing about the steps I took to create <a href="/2019/06/optical-financal-backups/">optical backups of financial archives</a>. Since then I’ve used it to create my monthly password database …</p><p><a href="https://github.com/pigmonkey/optician">Optician</a> archives a directory, optionally encrypts it, records the integrity of all the things, and burns it to disc. I created it last year after writing about the steps I took to create <a href="/2019/06/optical-financal-backups/">optical backups of financial archives</a>. Since then I’ve used it to create my monthly password database backups, yearly e-book library backups, and this year’s annual financial backup.</p>New Year, New Drive2020-01-19T00:00:00-08:002020-01-19T16:50:17-08:00Pig Monkeytag:pig-monkey.com,2020-01-19:/2020/01/new-year-new-drive/<p>My first solid state drive was a <a href="https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-850-pro-2-5-sata-iii-1tb-mz-7ke1t0bw/">Samsung 850 Pro 1TB</a> purchased in 2015. Originally I installed it in my T430s. The following year it migrated to my new <a href="/2016/04/x260/">X260</a>, where it has served admirably ever since. It still seems healthy, as best as I can tell. Sometime ago I …</p><p>My first solid state drive was a <a href="https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-850-pro-2-5-sata-iii-1tb-mz-7ke1t0bw/">Samsung 850 Pro 1TB</a> purchased in 2015. Originally I installed it in my T430s. The following year it migrated to my new <a href="/2016/04/x260/">X260</a>, where it has served admirably ever since. It still seems healthy, as best as I can tell. Sometime ago I found <a href="http://www.jdgleaver.co.uk/blog/2014/05/23/samsung_ssds_reading_total_bytes_written_under_linux.html">a script for measuring the health of Samsung SSDs</a>. It reports:</p>
<div class="highlight"><pre><span></span><code><span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> SSD Status: /dev/sda</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> On time: 17</span><span class="nt">,</span><span class="c">277 hr</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Data written:</span>
<span class="c"> MB: 47</span><span class="nt">,</span><span class="c">420</span><span class="nt">,</span><span class="c">539</span><span class="nt">.</span><span class="c">560</span>
<span class="c"> GB: 46</span><span class="nt">,</span><span class="c">309</span><span class="nt">.</span><span class="c">120</span>
<span class="c"> TB: 45</span><span class="nt">.</span><span class="c">223</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Mean write rate:</span>
<span class="c"> MB/hr: 2</span><span class="nt">,</span><span class="c">744</span><span class="nt">.</span><span class="c">720</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Drive health: 98 %</span>
<span class="nb">------------------------------</span><span class="c"></span>
</code></pre></div>
<p>The 1 terabyte of storage has begun to feel tight over the past couple of years. I’m not sure where it all goes, but I regularly only have about 100GB free, which is not much of a buffer. I’ve had my eye on a <a href="https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-860-evo-2-5--sata-iii-2tb-mz-76e2t0b-am/">Samsung 860 Evo 2TB</a> as a replacement. Last November <a href="https://camelcamelcamel.com/product/B0786QNSBD">my price monitoring tool</a> notified me of a significant price drop for this new drive, so I snatched one up. This weekend I finally got around to installing it.</p>
<p>The health script reports that my new drive is, in fact, both new and healthy:</p>
<div class="highlight"><pre><span></span><code><span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> SSD Status: /dev/sda</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> On time: 17 hr</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Data written:</span>
<span class="c"> MB: 872</span><span class="nt">,</span><span class="c">835</span><span class="nt">.</span><span class="c">635</span>
<span class="c"> GB: 852</span><span class="nt">.</span><span class="c">378</span>
<span class="c"> TB: </span><span class="nt">.</span><span class="c">832</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Mean write rate:</span>
<span class="c"> MB/hr: 51</span><span class="nt">,</span><span class="c">343</span><span class="nt">.</span><span class="c">272</span>
<span class="nb">------------------------------</span><span class="c"></span>
<span class="c"> Drive health: 100 %</span>
<span class="nb">------------------------------</span><span class="c"></span>
</code></pre></div>
<p>When migrating to a new drive, the simple solution is to just copy the complete contents of the old drive. I usually do not take this approach. Instead I prefer to imagine that the old drive is lost, and use the migration as an exercise to ensure that my <a href="/tag/backups/">excessive backup strategies</a> and <a href="https://github.com/pigmonkey/spark">OS provisioning system</a> are both fully operational. Successfully rebuilding my laptop like this, with a minimum expenditure of time and effort – and no data loss – makes me feel good about my backup and recovery tooling.</p>Optical Backups of Financial Archives2019-06-29T00:00:00-07:002019-06-29T14:49:50-07:00Pig Monkeytag:pig-monkey.com,2019-06-29:/2019/06/optical-financal-backups/<p>Every year I burn an optical archive of my financial documents, similar to how (and why) I <a href="/2013/05/optical-photo-backups/">create optical backups of photos</a>. I schedule this financial archive for the spring, after the previous year’s taxes have been submitted and accepted. <a href="https://taskwarrior.org/">Taskwarrior</a> solves the problem of remembering to complete the …</p><p>Every year I burn an optical archive of my financial documents, similar to how (and why) I <a href="/2013/05/optical-photo-backups/">create optical backups of photos</a>. I schedule this financial archive for the spring, after the previous year’s taxes have been submitted and accepted. <a href="https://taskwarrior.org/">Taskwarrior</a> solves the problem of remembering to complete the archive.</p>
<div class="highlight"><pre><span></span><code>$ task add project:finance due:2019-04-30 recur:yearly wait:due-4weeks <span class="s2">"burn optical financial archive with parity"</span>
</code></pre></div>
<p>The archive includes two <a href="https://git-annex.branchable.com/">git-annex</a> repositories.</p>
<p>The first is my <a href="https://www.ledger-cli.org/">ledger</a> repository. Ledger is the double-entry accounting system I began using in 2012 to record the movement of every penny that crosses one of my bank accounts (small cash transactions, less than about $20, are usually-but-not-always except from being recorded). In addition to the plain-text ledger files, this repository also holds PDF or JPG images of receipts.</p>
<p>The second repository holds my tax information. Each tax year gets a <a href="https://git.zx2c4.com/ctmg/about/">ctmg</a> container which contains any documents used to complete my tax returns, the returns themselves, and any notifications of those returns being accepted.</p>
<p>The yearly optical archive that I create holds the entirety of these two repositories – not just the information from the previous year – so really each disc only needs to have a shelf life of 12 months. Keeping the older discs around just provides redundancy for prior years.</p>
<h2>Creating the Archive</h2>
<p>The process of creating the archive is very similar to the process I outlined six years ago for the photo archives.</p>
<p>The two repositories, combined, are about 2GB (most of that is the directory of receipts from the ledger repository). I burn these to a 25GB BD-R disc, so file size is not a concern. I’ll <code>tar</code> them, but skip any compression, which would just add extra complexity for no gain.</p>
<div class="highlight"><pre><span></span><code>$ mkdir ~/tmp/archive
$ <span class="nb">cd</span> ~/library
$ tar cvf ~/tmp/archive/ledger.tar ledger
$ tar cvf ~/tmp/archive/tax.tar tax
</code></pre></div>
<p>The ledger archive will get signed and encrypted with my PGP key. The contents of the tax repository are already encrypted, so I’ll skip encryption and just sign the archive. I like using detached signatures for this.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">cd</span> ~/tmp/archive
$ gpg -e -r peter@havenaut.net -o ledger.tar.gpg ledger.tar
$ gpg -bo ledger.tar.gpg.sig ledger.tar.gpg
$ gpg -bo tax.tar.sig tax.tar
$ rm ledger.tar
</code></pre></div>
<p>Previously, when creating optical photo archives, I used <a href="https://web.archive.org/web/20160427222800/http://dvdisaster.net/en/index.html">DVDisaster</a> to create the disc image with parity. DVDisaster no longer exists. The code can still be found, and the program still works, but nobody is developing it and it doesn’t even an official web presence. This makes me uncomfortable for a tool that is part of my long-term archiving plans. As a result, I’ve moved back to using <a href="https://parchive.github.io/">Parchive</a> for parity. Parchive also does not have much in the way of active development around it, but it <a href="https://github.com/Parchive/par2cmdline/commits/master">is still maintained</a>, has been around for a long period of time, is still used by a wide community, and will probably continue to exist as long as people share files on less-than-perfectly-reliable mediums.</p>
<p>As previously mentioned, I’m not worried about the storage space for these files, so I tell <code>par2create</code> to create PAR2 files with 30% redundancy. I suppose I could go even higher, but 30% seems like a good number. By default this process will be allowed to use 16MB of memory, which is cute, but RAM is cheap and I usually have enough to spare so I’ll give it permission to use up to 8GB.</p>
<div class="highlight"><pre><span></span><code>$ par2create -r30 -m8000 recovery.par2 *
</code></pre></div>
<p>Next I’ll use <a href="http://md5deep.sourceforge.net/">hashdeep</a> to generate message digests for all the files in the archive.</p>
<div class="highlight"><pre><span></span><code>$ hashdeep * > hashes
</code></pre></div>
<p>At this point all the file processing is completed. I’ll put a blank disc in my burner (a <a href="https://pioneerelectronics.com/PUSA/Computer/Computer+Drives/BDR-XD05B">Pioneer BDR-XD05B</a>) and burn the directory using <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">growisofs</a>.</p>
<div class="highlight"><pre><span></span><code>$ growisofs -Z /dev/sr0 -V <span class="s2">"Finances 2019"</span> -r *
</code></pre></div>
<h2>Verification</h2>
<p>The final step is to verify the disc. I have a few options on this front. These are the same steps I’d take years down the road if I actually needed to recover data from the archive.</p>
<p>I can use the previous hashes to find any files that do not match, which is a quick way to identify bit rot.</p>
<div class="highlight"><pre><span></span><code>$ hashdeep -x -k hashes *.<span class="o">{</span>gpg,tar,sig,par2<span class="o">}</span>
</code></pre></div>
<p>I can check the integrity of the PGP signatures.</p>
<div class="highlight"><pre><span></span><code>$ gpg --verify tax.tar.gpg<span class="o">{</span>.sig,<span class="o">}</span>
$ gpg --verify tax.tar<span class="o">{</span>.sig,<span class="o">}</span>
</code></pre></div>
<p>I can use the PAR2 files to verify the original data files.</p>
<div class="highlight"><pre><span></span><code>$ par2 verify recovery.par2
</code></pre></div>Archiving Bookmarks2018-11-23T00:00:00-08:002018-12-01T19:30:07-08:00Pig Monkeytag:pig-monkey.com,2018-11-23:/2018/11/archiving-bookmarks/<p>I signed-up for <a href="https://pinboard.in/">Pinboard</a> in 2014. It provides everything I need from a bookmarking service, which is mostly, you know, bookmarking. I pay for the <a href="https://pinboard.in/upgrade/">archival account</a>, meaning that Pinboard downloads a copy of everything I bookmark and provides me with full-text search. I find this useful and well worth …</p><p>I signed-up for <a href="https://pinboard.in/">Pinboard</a> in 2014. It provides everything I need from a bookmarking service, which is mostly, you know, bookmarking. I pay for the <a href="https://pinboard.in/upgrade/">archival account</a>, meaning that Pinboard downloads a copy of everything I bookmark and provides me with full-text search. I find this useful and well worth the $25 yearly fee, but Pinboard’s archive is only part of the solution. I also need an offline copy of my bookmarks.</p>
<p>Pinboard provides an <a href="https://pinboard.in/api/">API</a> that makes it easy to acquire a list of bookmarks. I have a <a href="https://github.com/pigmonkey/systools/blob/master/pinboard-backup.sh">small shell script</a> which pulls down a JSON-formatted list of my bookmarks and adds the file to <a href="https://git-annex.branchable.com/">git-annex</a>. This is controlled via a systemd <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/pinboard-backup.service">service</a> and <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/pinboard-backup.timer">timer</a>, which wraps the script in <a href="https://github.com/pigmonkey/backitup/">backitup</a> to ensure daily dumps. The systemd timer itself is controlled by <a href="https://github.com/pigmonkey/nmtrust">nmtrust</a>, so that it only runs when I am connected to a trusted network.</p>
<p>This provides data portability, ensuring that I could import my tagged URLs to another bookmarking service if I ever found something better than Pinboard (unlikely, <a href="https://blog.pinboard.in/2017/06/pinboard_acquires_delicious/">competing with Pinboard is futile</a>). But I also want a locally archived copy of the pages themselves, which Pinboard does not offer through the API. I carry very much about being able to <a href="/2012/10/working-offline/">work offline</a>. The usefulness of a computer is directly propertional to the amount of data that is accessible without a network connection.</p>
<p>To address this I use <a href="https://github.com/pirate/bookmark-archiver">bookmark-archiver</a>, a Python script which reads URLs from a variety of input files, including Pinboard’s JSON dumps. It archives each URL via wget, generates a screenshot and PDF via headless Chromium, and submits the URL to the Internet Archive (<a href="https://github.com/pirate/bookmark-archiver/issues/6">with WARC hopefully on the way</a>). It will then generate an HTML index page, allowing the archives to be easily browsed. When I want to browse the archive, I simply change into the directory and use <code>python -m http.server</code> to serve the bookmarks at <code>localhost:8000</code>. Once downloaded locally, the archives are of course backed up, via the usual suspects like <a href="/2017/07/borg/">borg</a> and <a href="https://github.com/pigmonkey/cryptshot">cryptshot</a>.</p>
<p>The archiver is configured via environment variables. I configure my preferences and point the program at the Pinboard JSON dump in my annex via <a href="https://github.com/pigmonkey/systools/blob/master/bookmark-archiver">a shell script</a> (creatively also named <code>bookmark-archiver</code>). This wrapper script <a href="https://github.com/pigmonkey/systools/blob/master/pinboard-backup.sh#L14">is called by the previous script</a> which dumps the JSON from Pinboard.</p>
<p>The result of all of this is that every day I get a fresh dump of all my bookmarks, each URL is archived locally in multiple formats, and the archive enters into my normal backup queue. <a href="https://www.gwern.net/Archiving-URLs#link-rot">Link rot</a> may <a href="https://www.theatlantic.com/technology/archive/2013/09/49-of-the-links-cited-in-supreme-court-decisions-are-broken/279901/">defeat the Supreme Court</a>, but between this and my <a href="/2017/06/repos/">automated repository tracking</a> I have a pretty good system for backing up useful pieces of other people’s data.</p>On E-Books2018-11-17T00:00:00-08:002018-11-17T16:07:18-08:00Pig Monkeytag:pig-monkey.com,2018-11-17:/2018/11/ebooks/<p>The <a href="https://en.wikipedia.org/wiki/Amazon_Kindle#Kindle_Paperwhite_(2nd_generation)">Kindle Paperwhite</a> has been my primary medium for consuming books since the beginning of 2014. <a href="https://en.wikipedia.org/wiki/E_Ink">E Ink</a> is a great display technology that I wish was more wide spread, but beyond the fact that the Kindle (and I assume other e-readers) makes for a pleasant reading experience, the real …</p><p>The <a href="https://en.wikipedia.org/wiki/Amazon_Kindle#Kindle_Paperwhite_(2nd_generation)">Kindle Paperwhite</a> has been my primary medium for consuming books since the beginning of 2014. <a href="https://en.wikipedia.org/wiki/E_Ink">E Ink</a> is a great display technology that I wish was more wide spread, but beyond the fact that the Kindle (and I assume other e-readers) makes for a pleasant reading experience, the real value in electronic books is storage.</p>
<p>At its peak my physical collection was somewhere north of 200 books. As <a href="/2010/08/on-books/">I mentioned years ago</a> I took inspiration from Gary Snyder’s character in The Dharma Bums and stored my books in milk crates, which stack like a bookcase for normal use and kept the collection pre-boxed for moving. But that many books still take up space, and are still annoying to move. And in some regards they are fragile – redundant data storage is expensive in meatspace.</p>
<p>My digital library currently sits at 572 books and 13 gigabytes (the size skyrocketed after I began to archive a few comics). I could not justify that many physical books in my life. I still have a collection of dead trees, but I’m down to 3 milk crates. I store my digital library in <a href="https://git-annex.branchable.com/">git-annex</a>, allowing me to <a href="/2016/08/rclone/">redundantly replicate</a> my collection across the globe, as well as keep copies in <a href="/2016/08/storage/">cold storage</a>. I also burn yearly <a href="/2013/05/optical-photo-backups/">optical backups</a> of the library to <a href="https://en.wikipedia.org/wiki/M-DISC">M-DISC</a>. The library is managed with <a href="https://calibre-ebook.com/">Calibre</a>.</p>
<p>When I first bought the Kindle it required internet access to associate with my Amazon account. Ever since then, it has been in airplane mode. I spun up a temporary wireless network for the setup that I then deleted after the process was complete, ensuring that even if Amazon’s airplane mode was untrustworthy, the device would not be able to phone home. The advantages of giving the Kindle internet access seem minute, and are far outweighed by the disadvantage of having to trust Amazon.</p>
<p>If I purchase a book from Amazon, I select the “Download & Transfer via USB” option. This results in a crippled <a href="https://en.wikipedia.org/wiki/Kindle_File_Format">AZW</a> file. I am under the radical delusion that I should own what I purchase, so I import that file into Calibre using the <a href="https://github.com/apprenticeharper/DeDRM_tools">DeDRM_tools</a> plugin. This strips any DRM, making the book ready to be consumed and archived. Books are transferred between my computer and the Kindle via USB, which Calibre makes simple.</p>
<p>When I acquire books through other channels, my preferred format is always <a href="https://en.wikipedia.org/wiki/EPUB">EPUB</a>: an open format that is simply a zip archive of HTML files. Calibre’s built-in conversion tools are quite good, giving me confidence that any e-book format I import into the library will be readable at any point in the future, but my preference is to store data in formats that are open, accessible, and understandable. The closer one gets to well-formatted plain text, the closer one gets to god.</p>
<p>While the Kindle excels at the linear reading of novels, I’ve also come to appreciate digital copies of reference books and technical manuals. Often the first reading of these types of books involves lots of flipping back and forth, which is easier in the dead tree variant, but after that first reading the searchability of the digital copy is far more useful for reference. The physical size of these types of books also makes them even more difficult to carry and store than other books, all but guaranteeing you won’t have access to them when you need to reference them. Digital books solve that problem.</p>
<p>I’m confident in my ability to securely store digital data. Whenever I import a book into my library, I know that I now have permanent access to that knowledge for the rest of my life, regardless of environmental disaster, the whims of publishing houses, or the size of my living quarters.</p>LUKS Header Backup2017-07-16T00:00:00-07:002017-07-16T11:03:56-07:00Pig Monkeytag:pig-monkey.com,2017-07-16:/2017/07/luks/<p>I’d neglected backup <a href="https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">LUKS</a> headers until <a href="https://www.gwern.net/Notes#november-2016-data-loss-postmortem">Gwern’s data loss postmortem</a> last year. After reading his post I dumped the headers of the drives I had accessible, but I never got around to performing the task on my less frequently accessed drives. Last month I had trouble mounting one …</p><p>I’d neglected backup <a href="https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">LUKS</a> headers until <a href="https://www.gwern.net/Notes#november-2016-data-loss-postmortem">Gwern’s data loss postmortem</a> last year. After reading his post I dumped the headers of the drives I had accessible, but I never got around to performing the task on my less frequently accessed drives. Last month I had trouble mounting one of those drives. It turned out I was simply using the wrong passphrase, but the experience prompted me to make sure I had completed the header backup procedure for all drives.</p>
<p>I dump the header to memory using <a href="https://wiki.archlinux.org/index.php/Dm-crypt/Device_encryption#Backup_using_cryptsetup">the procedure from the Arch wiki</a>. This is probably unnecessary, but only takes a few extra steps. The header is stored in my password store, which is obsessively backed up.</p>
<div class="highlight"><pre><span></span><code>$ sudo mkdir /mnt/tmp
$ sudo mount ramfs /mnt/tmp -t ramfs
$ sudo cryptsetup luksHeaderBackup /dev/sdc --header-backup-file /mnt/tmp/dump
$ sudo chown pigmonkey:pigmonkey /mnt/tmp/dump
$ pass insert -m crypt/luksheader/themisto < /mnt/tmp/dump
$ sudo umount /mnt/tmp
$ sudo rmdir /mnt/tmp
</code></pre></div>Borg Assimilation2017-07-05T00:00:00-07:002017-11-15T10:10:07-08:00Pig Monkeytag:pig-monkey.com,2017-07-05:/2017/07/borg/<p>For years the core of my backup strategy has been <a href="http://rsnapshot.org/">rsnapshot</a> via <a href="https://github.com/pigmonkey/cryptshot">cryptshot</a> to various external drives for local backups, and <a href="https://www.tarsnap.com/">Tarsnap</a> for remote backups.</p>
<p>Tarsnap, however, can be slow. It tends to take somewhere between 15 to 20 minutes to create my dozen or so archives, even if little …</p><p>For years the core of my backup strategy has been <a href="http://rsnapshot.org/">rsnapshot</a> via <a href="https://github.com/pigmonkey/cryptshot">cryptshot</a> to various external drives for local backups, and <a href="https://www.tarsnap.com/">Tarsnap</a> for remote backups.</p>
<p>Tarsnap, however, can be slow. It tends to take somewhere between 15 to 20 minutes to create my dozen or so archives, even if little has changed since the last run. My impression is that this is simply due to the number of archives I have stored and the number of files I ask it to archive. Once it has decided what to do, the time spent transferring data is negligible. I run Tarsnap hourly. Twenty minutes out of every hour seems like a lot of time spent Tarsnapping.</p>
<p>I’ve eyed <a href="https://github.com/borgbackup/borg">Borg</a> for a while (and before that, <a href="https://attic-backup.org/">Attic</a>), but avoided using it due to the rapid development of its earlier days. While activity is nice, too many changes too close together do not create a reassuring image of a backup project. Borg seems to have stabilized now and has a large enough user base that I feel comfortable with it. About a month ago, I began using it to backup my laptop to <a href="http://www.rsync.net/products/attic.html">rsync.net</a>.</p>
<p>Initially I played with <a href="https://torsion.org/borgmatic/">borgmatic</a> to perform and maintain the backups. Unfortunately it seems to have issues with signal handling, which caused me to end up with annoying lock files left over from interrupted backups. Borg itself has <a href="https://borgbackup.readthedocs.io/en/stable/">good documentation</a> and is <a href="https://borgbackup.readthedocs.io/en/stable/usage.html">easy to use</a>, and I think it is useful to build familiarity with the program itself instead of only interacting with it through something else. So I did away with borgmatic and wrote a small bash script to handle my use case.</p>
<p><a href="https://borgbackup.readthedocs.io/en/stable/usage.html#borg-create">Creating the backups</a> is simple enough. Borg disables compression by default, but after a little experimentation I found that LZ4 seemed to be a decent compromise between compression and performance.</p>
<p><a href="https://borgbackup.readthedocs.io/en/stable/usage.html#borg-prune">Pruning backups</a> is equally easy. I knew I wanted to match roughly what I had with Tarsnap: hourly backups for a day or so, daily backups for a week or so, then a month or two of weekly backups, and finally a year or so of monthly backups.</p>
<p>My only hesitation was in how to maintain the health of the backups. Borg provides the convenient <a href="https://borgbackup.readthedocs.io/en/stable/usage.html#borg-check">borg check</a> command, which is able to verify the consistency of both a repository and the archives themselves. Unsurprisingly, this is a slow process. I didn’t want to run it with my hourly backups. Daily, or perhaps even weekly, seemed more reasonable, but I did want to make sure that both checks were completed successfully with some frequency. Luckily this is just the problem that I wrote <a href="https://github.com/pigmonkey/backitup">backitup</a> to solve.</p>
<p>Because the consistency checks take a while and consume some resources, I thought it would also be a good idea to avoid performing them when I’m running on battery. Giving backitup the ability to detect if the machine is on battery or AC power was <a href="https://github.com/pigmonkey/backitup/commit/0cd4d3a45df02a5f592617f8a4ad3811a02c9a38">a simple hack</a>. The script now features the <code>-a</code> switch to specify that the program should only be executed when on AC power.</p>
<p>My completed Borg wrapper is thus:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/bin/sh</span>
<span class="nb">export</span> <span class="nv">BORG_PASSPHRASE</span><span class="o">=</span><span class="s1">'supers3cr3t'</span>
<span class="nb">export</span> <span class="nv">BORG_REPO</span><span class="o">=</span><span class="s1">'borg-rsync:borg/nous'</span>
<span class="nb">export</span> <span class="nv">BORG_REMOTE_PATH</span><span class="o">=</span><span class="s1">'borg1'</span>
<span class="c1"># Create backups</span>
<span class="nb">echo</span> <span class="s2">"Creating backups..."</span>
borg create --verbose --stats --compression<span class="o">=</span>lz4 <span class="se">\</span>
--exclude ~/projects/foo/bar/baz <span class="se">\</span>
--exclude ~/projects/xyz/bigfatbinaries <span class="se">\</span>
::<span class="s1">'{hostname}-{user}-{utcnow:%Y-%m-%dT%H:%M:%S}'</span> <span class="se">\</span>
~/documents <span class="se">\</span>
~/projects <span class="se">\</span>
~/mail <span class="se">\</span>
<span class="c1"># ...etc</span>
<span class="c1"># Prune backups</span>
<span class="nb">echo</span> <span class="s2">"Pruning backups..."</span>
borg prune --verbose --list --prefix <span class="s1">'{hostname}-{user}-'</span> <span class="se">\</span>
--keep-within<span class="o">=</span>2d <span class="se">\</span>
--keep-daily<span class="o">=</span><span class="m">14</span> <span class="se">\</span>
--keep-weekly<span class="o">=</span><span class="m">8</span> <span class="se">\</span>
--keep-monthly<span class="o">=</span><span class="m">12</span> <span class="se">\</span>
<span class="c1"># Check backups</span>
<span class="nb">echo</span> <span class="s2">"Checking repository..."</span>
backitup -a <span class="se">\</span>
-p <span class="m">172800</span> <span class="se">\</span>
-l ~/.borg_check-repo.lastrun <span class="se">\</span>
-b <span class="s2">"borg check --verbose --repository-only"</span> <span class="se">\</span>
<span class="nb">echo</span> <span class="s2">"Checking archives..."</span>
backitup -a <span class="se">\</span>
-p <span class="m">259200</span> <span class="se">\</span>
-l ~/.borg_check-arch.lastrun <span class="se">\</span>
-b <span class="s2">"borg check --verbose --archives-only --last 24"</span> <span class="se">\</span>
</code></pre></div></td></tr></table></div>
<p>This is executed by a <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/borg.service">systemd service</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span><span class="w"></span>
<span class="na">Description</span><span class="o">=</span><span class="s">Borg Backup</span><span class="w"></span>
<span class="k">[Service]</span><span class="w"></span>
<span class="na">Type</span><span class="o">=</span><span class="s">oneshot</span><span class="w"></span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/home/pigmonkey/bin/borgwrapper.sh</span><span class="w"></span>
<span class="k">[Install]</span><span class="w"></span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">multi-user.target</span><span class="w"></span>
</code></pre></div>
<p>The service is called hourly by a <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/borg.timer">systemd timer</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span><span class="w"></span>
<span class="na">Description</span><span class="o">=</span><span class="s">Borg Backup Timer</span><span class="w"></span>
<span class="k">[Timer]</span><span class="w"></span>
<span class="na">Unit</span><span class="o">=</span><span class="s">borg.service</span><span class="w"></span>
<span class="na">OnCalendar</span><span class="o">=</span><span class="s">hourly</span><span class="w"></span>
<span class="na">Persistent</span><span class="o">=</span><span class="s">True</span><span class="w"></span>
<span class="k">[Install]</span><span class="w"></span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">timers.target</span><span class="w"></span>
</code></pre></div>
<p>I don’t enable the timer directly, but add it to <code>/usr/local/etc/trusted_units</code> so that <a href="https://github.com/pigmonkey/nmtrust">nmtrust</a> activates it when I’m connected to trusted networks.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">echo</span> <span class="s2">"borg.timer,user:pigmonkey"</span> >> /usr/local/etc/trusted_units
</code></pre></div>
<p>I’ve been running this for about a month now and have been pleased with the results. It averages about 30 seconds to create the backups every hour, and another 30 seconds or so to prune the old ones. As with Tarsnap, deduplication is great.</p>
<div class="highlight"><pre><span></span><code><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"> Original size Compressed size Deduplicated size</span>
<span class="c">This archive: 19</span><span class="nt">.</span><span class="c">87 GB 18</span><span class="nt">.</span><span class="c">41 GB 10</span><span class="nt">.</span><span class="c">21 MB</span>
<span class="c">All archives: 836</span><span class="nt">.</span><span class="c">02 GB 773</span><span class="nt">.</span><span class="c">35 GB 19</span><span class="nt">.</span><span class="c">32 GB</span>
<span class="c"> Unique chunks Total chunks</span>
<span class="c">Chunk index: 371527 14704634</span>
<span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
</code></pre></div>
<p>The most recent repository consistency check took about 30 minutes, but only runs every 172800 seconds, or once every other day. The most recent archive consistency check took about 40 minutes, but only runs every 259200 seconds, or once per 3 days. I’m not sure that those schedules are the best option for the consistency checks. I may tweak their frequencies, but because I know they will only be executed when I am on a trusted network and AC power, I’m less concerned about the length of time.</p>
<p>With Borg running hourly, I’ve reduced Tarsnap to run only once per day. Time will tell if Borg will slow as the number of stored archives increase, but for now running Borg hourly and Tarsnap daily seems like a great setup. Tarsnap and Borg both target the same files (with a few exceptions). Tarsnap runs in the AWS us-east-1 region. I’ve always kept my rsync.net account in their Zurich datacenter. This provides the kind of redundancy that lets me rest easy.</p>
<p>Contrary to what you might expect given the <a href="/tag/backups/">number of blog posts on the subject</a>, I actually spend close to no time worrying about data loss in my day to day life, thanks to stuff like this. An ounce of prevention, and all that. (Maybe a few kilograms of prevention in my case.)</p>Automated Repository Tracking2017-06-29T00:00:00-07:002017-06-29T20:45:55-07:00Pig Monkeytag:pig-monkey.com,2017-06-29:/2017/06/repos/<p>I have confidence in my backup strategies for my own data, but until recently I had not considered backing up other people’s data.</p>
<p>Recently, the author of a repository that I tracked on GitHub deleted his account and disappeared from the information super highway. I had a local copy …</p><p>I have confidence in my backup strategies for my own data, but until recently I had not considered backing up other people’s data.</p>
<p>Recently, the author of a repository that I tracked on GitHub deleted his account and disappeared from the information super highway. I had a local copy of the repository, but I had not pulled it for a month. A number of recent changes were lost to me. This inspired me to setup the system I now use to automatically update local copies of any code repositories that are useful or interesting to me.</p>
<p>I clone the repositories into <code>~/library/src</code> and use <a href="https://myrepos.branchable.com/">myrepos</a> to interact with them. I use myrepos for work and personal repositories as well, so to keep this stuff segregated I setup a separate config file and a <a href="https://github.com/pigmonkey/dotfiles/blob/master/aliases#L63">shell alias</a> to refer to it.</p>
<div class="highlight"><pre><span></span><code>alias lmr='mr --config $HOME/library/src/myrepos.conf --directory=$HOME/library/src'
</code></pre></div>
<p>Now when I want to add a new repository, I clone it normally and register it with myrepos.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">cd</span> ~/library/src
$ git clone https://github.com/warner/magic-wormhole
$ <span class="nb">cd</span> magic-wormhole <span class="o">&&</span> lmr register
</code></pre></div>
<p>The <code>~/library/src/myrepos.conf</code> file has a default section which states that no repository should be updated more than once every 24 hours.</p>
<div class="highlight"><pre><span></span><code><span class="k">[DEFAULT]</span><span class="w"></span>
<span class="na">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[ "$1" = update ] && ! hours_since "$1" 24</span><span class="w"></span>
</code></pre></div>
<p>Now I can ask myrepos to update all of my tracked repositories. If it sees that it has already updated a repository within 24 hours, myrepos will skip the repository.</p>
<div class="highlight"><pre><span></span><code>$ lmr update
</code></pre></div>
<p>To automate this I create a <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/library-repos.service">systemd service</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span><span class="w"></span>
<span class="na">Description</span><span class="o">=</span><span class="s">Update library repositories</span><span class="w"></span>
<span class="k">[Service]</span><span class="w"></span>
<span class="na">Type</span><span class="o">=</span><span class="s">oneshot</span><span class="w"></span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/bin/mr --config %h/library/src/myrepos.conf -j5 update</span><span class="w"></span>
<span class="k">[Install]</span><span class="w"></span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">multi-user.target</span><span class="w"></span>
</code></pre></div>
<p>And a <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/systemd/user/library-repos.timer">systemd timer to run the service every hour</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span><span class="w"></span>
<span class="na">Description</span><span class="o">=</span><span class="s">Update library repositories timer</span><span class="w"></span>
<span class="k">[Timer]</span><span class="w"></span>
<span class="na">Unit</span><span class="o">=</span><span class="s">library-repos.service</span><span class="w"></span>
<span class="na">OnCalendar</span><span class="o">=</span><span class="s">hourly</span><span class="w"></span>
<span class="na">Persistent</span><span class="o">=</span><span class="s">True</span><span class="w"></span>
<span class="k">[Install]</span><span class="w"></span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">timers.target</span><span class="w"></span>
</code></pre></div>
<p>I don’t enable this timer directly, but instead add it to my <code>trusted_units</code> file so that <a href="https://github.com/pigmonkey/nmtrust">nmtrust</a> will enable it only when I am on a trusted network.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">echo</span> <span class="s2">"library-repos.timer,user:pigmonkey"</span> >> /usr/local/etc/trusted_units
</code></pre></div>
<p>If I’m curious to see what has been recently active, I can <code>ls -ltr ~/library/src</code>. I find this more useful than <a href="https://help.github.com/articles/about-stars/">GitHub stars</a> or similar bookmarking.</p>
<p>I currently track 120 repositories. This is only 3.3 GB, which means I can incorporate it into my normal backup strategies without being concerned about the extra space.</p>
<p>The internet can be fickle, but it will be difficult for me to loose a repository again.</p>Cold Storage2016-08-26T00:00:00-07:002016-08-27T18:19:34-07:00Pig Monkeytag:pig-monkey.com,2016-08-26:/2016/08/storage/<p>This past spring I mentioned my <a href="/2016/03/backup/">cold storage setup</a>: a number of encrypted 2.5” drives in external enclosures, stored inside a <a href="http://www.pelican.com/us/en/product/watertight-protector-hard-cases/small-case/standard/1200/">Pelican 1200</a> case, secured with <a href="https://securitysnobs.com/Abloy-Protec2-PL-321-Padlock.html">Abloy Protec2 321</a> locks. Offline, secure, and infrequently accessed storage is an important component of any strategy for resilient data. The ease with …</p><p>This past spring I mentioned my <a href="/2016/03/backup/">cold storage setup</a>: a number of encrypted 2.5” drives in external enclosures, stored inside a <a href="http://www.pelican.com/us/en/product/watertight-protector-hard-cases/small-case/standard/1200/">Pelican 1200</a> case, secured with <a href="https://securitysnobs.com/Abloy-Protec2-PL-321-Padlock.html">Abloy Protec2 321</a> locks. Offline, secure, and infrequently accessed storage is an important component of any strategy for resilient data. The ease with which this can be managed with <a href="https://git-annex.branchable.com/">git-annex</a> only increases <a href="/tag/annex/">my infatuation with the software</a>.</p>
<p><a href="https://www.flickr.com/photos/pigmonkey/29168947362/in/dateposted/" title="Data Data Data Data Data"><img src="https://c3.staticflickr.com/9/8405/29168947362_2c7ecc9a97_c.jpg" width="800" height="450" alt="Data Data Data Data Data"></a></p>
<p>I’ve been happy with the <a href="https://www.amazon.com/gp/product/B00MPWYLHO/">Seagate ST2000LM003</a> drives for this application. Unfortunately the enclosures I first purchased did not work out so well. I had two die within a few weeks. They’ve been replaced with the <a href="https://www.amazon.com/gp/product/B00YT6TOJO/">SIG JU-SA0Q12-S1</a>. These claim to be compatible with drives up to 8TB (someday I’ll be able to buy 8TB 2.5” drives) and support USB 3.1. They’re also a bit thinner than the previous enclosures, so I can easily fit five in my box. The Seagate drives offer about 1.7 terabytes of usable space, giving this setup a total capacity of 8.5 terabytes.</p>
<p>Setting up git-annex to support this type of cold storage is fairly straightforward, but does necessitate some familiarity with how the program works. Personally, I prefer to do all my setup manually. I’m happy to let the <a href="http://git-annex.branchable.com/assistant/">assistant</a> watch my repositories and manage them after the setup, and I’ll occasionally fire up the <a href="https://git-annex.branchable.com/design/assistant/webapp/">web app</a> to see what the assistant daemon is doing, but I like the control and understanding provided by a manual setup. The power and flexibility of git-annex is deceptive. Using it solely through the simplified interface of the web app greatly limits what can be accomplished with it.</p>
<h2>Encryption</h2>
<p>Before even getting into git-annex, the drive should be encrypted with <a href="https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">LUKS</a>/<a href="https://en.wikipedia.org/wiki/Dm-crypt">dm-crypt</a>. The need for this could be avoided by using something like <a href="https://git-annex.branchable.com/special_remotes/gcrypt/">gcrypt</a>, but LUKS/dm-crypt is an ingrained habit and part of my workflow for all external drives. Assuming the drive is <code>/dev/sdc</code>, pass <code>cryptsetup</code> some sane defaults:</p>
<div class="highlight"><pre><span></span><code>$ sudo cryptsetup --cipher aes-xts-plain64 --key-size <span class="m">512</span> --hash sha512 luksFormat /dev/sdc
</code></pre></div>
<p>With the drive encrypted, it can then be opened and formatted. I’ll give the drive a human-friendly label of <code>themisto</code>.</p>
<div class="highlight"><pre><span></span><code>$ sudo cryptsetup luksOpen /dev/sdc themisto_crypt
$ sudo mkfs.ext4 -L themisto /dev/mapper/themisto_crypt
</code></pre></div>
<p>At this point the drive is ready. I close it and then mount it with <a href="https://github.com/coldfix/udiskie">udiskie</a> to make sure everything is working. How the drive is mounted doesn’t matter, but I like udiskie because it can <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/udiskie/config.yml#L5">integrate with my password manager</a> to get the drive passphrase.</p>
<div class="highlight"><pre><span></span><code>$ sudo cryptsetup luksClose /dev/mapper/themisto_crypt
$ udiskie-mount -r /dev/sdc
</code></pre></div>
<h2>Git-Annex</h2>
<p>With the encryption handled, the drive should now be mounted at <code>/media/themisto</code>. For the first few steps, we’ll basically follow the <a href="https://git-annex.branchable.com/walkthrough/">git-annex walkthrough</a>. Let’s assume that we are setting up this drive to be a repository of the annex <code>~/video</code>. The first step is to go to the drive, clone the repository, and initialize the annex. When initializing the annex I prepend the name of the remote with <code>satellite :</code>. My cold storage drives are all named after satellites, and doing this allows me to easily identify them when looking at a list of remotes.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">cd</span> /media/themisto
$ git clone ~/video
$ <span class="nb">cd</span> video
$ git annex init <span class="s2">"satellite : themisto"</span>
</code></pre></div>
<h3>Disk Reserve</h3>
<p>Whenever dealing with a repository that is bigger (or may become bigger) than the drive it is being stored on, it is important to set a disk reserve. This tells git-annex to always keep some free space around. I generally like to set this to 1 GB, which is way larger than it needs to be.</p>
<div class="highlight"><pre><span></span><code>$ git config annex.diskreserve <span class="s2">"1 gb"</span>
</code></pre></div>
<h3>Adding Remotes</h3>
<p>I’ll then tell this new repository where the original repository is located. In this case I’ll refer to the original using the name of my computer, <code>nous</code>.</p>
<div class="highlight"><pre><span></span><code>$ git remote add nous ~/video
</code></pre></div>
<p>If other remotes already exist, now is a good time to add them. These could be <a href="https://git-annex.branchable.com/special_remotes/">special remotes</a> or normal ones. For this example, let’s say that we have already completed this whole process for another cold storage drive called <code>sinope</code>, and that we have an <a href="https://git-annex.branchable.com/special_remotes/S3/">s3</a> remote creatively named <code>s3</code>.</p>
<div class="highlight"><pre><span></span><code>$ git remote add sinope /media/sinope/video
$ <span class="nb">export</span> <span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span><span class="s2">"..."</span>
$ <span class="nb">export</span> <span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span><span class="s2">"..."</span>
$ git annex enableremote s3
</code></pre></div>
<h3>Trust</h3>
<p><a href="https://git-annex.branchable.com/trust/">Trust</a> is a critical component of how git-annex works. Any new annex will default to being semi-trusted, which means that when running operations within the annex on the main computer – say, dropping a file – git-annex will want to confirm that <code>themisto</code> has the files that it is supposed to have. In the case of <code>themisto</code> being a USB drive that is rarely connected, this is not very useful. I tell git-annex to trust my cold storage drives, which means that if git-annex has a record of a certain file being on the drive, it will be satisfied with that. This increases the risk for potential data-loss, but for this application I feel it is appropriate.</p>
<div class="highlight"><pre><span></span><code>$ git annex trust .
</code></pre></div>
<h3>Preferred Content</h3>
<p>The final step that needs to be taken on the new repository is to tell it what files it should want. This is done using <a href="https://git-annex.branchable.com/preferred_content/">preferred content</a>. The <a href="https://git-annex.branchable.com/preferred_content/standard_groups/">standard groups</a> that git-annex ships with cover most of the bases. Of interest for this application is the <code>archive</code> group, which wants all content except that which has already found its way to another archive. This is the behaviour I want, but I will duplicate it into a custom group called <code>satellite</code>. This keeps my cold storage drives as standalone things that do not influence any other remotes where I may want to use the default <code>archive</code>.</p>
<div class="highlight"><pre><span></span><code>$ git annex groupwanted satellite <span class="s2">"(not copies=satellite:1) or approxlackingcopies=1"</span>
$ git annex group . satellite
$ git annex wanted . groupwanted
</code></pre></div>
<p>For other repositories, I may want to store the data on multiple cold storage drives. In that case I would create a <code>redundantsatellite</code> group that wants all content which is not already present in two other members of the group.</p>
<div class="highlight"><pre><span></span><code>$ git annex groupwanted redundantsatellite <span class="s2">"(not copies=redundantsatellite:2) or approxlackingcopies=1"</span>
$ git annex group . redundantsatellite
$ git annex wanted . groupwanted
</code></pre></div>
<h3>Syncing</h3>
<p>With everything setup, the new repository is ready to sync and to start to ingest content from the remotes it knows about!</p>
<div class="highlight"><pre><span></span><code>$ git annex sync --content
</code></pre></div>
<p>However, the original repository also needs to know about the new remote.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nb">cd</span> ~/video
$ git remote add themisto /media/themisto/video
$ git annex sync
</code></pre></div>
<p>The same is the case for any other previously existing repository, such as <code>sinope</code>.</p>Redundant File Storage2016-08-19T00:00:00-07:002016-08-19T20:27:23-07:00Pig Monkeytag:pig-monkey.com,2016-08-19:/2016/08/rclone/<p>As I’ve <a href="/tag/annex/">mentioned previously</a>, I store just about everything that matters in <a href="https://git-annex.branchable.com/">git-annex</a> (the only exception is code, which is stored directly in regular git). One of git-annex’s many killer features is <a href="https://git-annex.branchable.com/special_remotes/">special remotes</a>. They make tenable this whole “cloud storage” thing that we do now.</p>
<p>A special …</p><p>As I’ve <a href="/tag/annex/">mentioned previously</a>, I store just about everything that matters in <a href="https://git-annex.branchable.com/">git-annex</a> (the only exception is code, which is stored directly in regular git). One of git-annex’s many killer features is <a href="https://git-annex.branchable.com/special_remotes/">special remotes</a>. They make tenable this whole “cloud storage” thing that we do now.</p>
<p>A special remote allows me to store my files with a large number of service providers. It makes this easy to do by abstracting away the particulars of the provider, allowing me to interact with all of them in the same way. It makes this safe to do by providing <a href="https://git-annex.branchable.com/encryption/">encryption</a>. These factors encourage redundancy, reducing my reliance on any one provider.</p>
<p>Recently I began playing with <a href="http://rclone.org/">rclone</a>. Rclone is a program that supports file syncing for a handful of cloud storage providers. That’s semi-interesting by itself but, more significantly, there is <a href="https://github.com/DanielDent/git-annex-remote-rclone">a git-annex special remote wrapper</a>. That means any of the providers supported by rclone can be used as a special remote. I looked through all of rclone’s supported providers and decided there were a few that I had no reason not to use.</p>
<h2>Hubic</h2>
<p><a href="https://hubic.com/en/">Hubic</a> is a storage provider from <a href="https://www.ovh.com/us/">OVH</a> with a data center in France. Their <a href="https://hubic.com/en/offers/">pricing</a> is attractive. I’d happily pay €50 per year for 10TB of storage. Unfortunately they limit connections to 10 Mbit/s. In my experience they ended up being even slower than this. Slow enough that I don’t want to give them money, but there’s still no reason not to take advantage of their free 25 GB plan.</p>
<p>After signing up, I <a href="http://rclone.org/hubic/">setup a new remote in rclone</a>.</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">rclone</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">New</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="k">Set</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="n">password</span><span class="w"></span>
<span class="n">q</span><span class="p">)</span><span class="w"> </span><span class="n">Quit</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="o">/</span><span class="n">s</span><span class="o">/</span><span class="n">q</span><span class="o">></span><span class="w"> </span><span class="n">n</span><span class="w"></span>
<span class="n">name</span><span class="o">></span><span class="w"> </span><span class="n">hubic</span><span class="o">-</span><span class="n">annex</span><span class="w"></span>
<span class="n">Type</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">configure</span><span class="p">.</span><span class="w"></span>
<span class="nf">Choose</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">below</span><span class="p">,</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">your</span><span class="w"> </span><span class="n">own</span><span class="w"> </span><span class="k">value</span><span class="w"></span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"amazon cloud drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">S3</span><span class="w"> </span><span class="p">(</span><span class="n">also</span><span class="w"> </span><span class="n">Dreamhost</span><span class="p">,</span><span class="w"> </span><span class="n">Ceph</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"s3"</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Backblaze</span><span class="w"> </span><span class="n">B2</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"b2"</span><span class="w"></span>
<span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Dropbox</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"dropbox"</span><span class="w"></span>
<span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"google cloud storage"</span><span class="w"></span>
<span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Hubic</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"hubic"</span><span class="w"></span>
<span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">Local</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"local"</span><span class="w"></span>
<span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Microsoft</span><span class="w"> </span><span class="n">OneDrive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"onedrive"</span><span class="w"></span>
<span class="mi">10</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Openstack</span><span class="w"> </span><span class="n">Swift</span><span class="w"> </span><span class="p">(</span><span class="n">Rackspace</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Files</span><span class="p">,</span><span class="w"> </span><span class="n">Memset</span><span class="w"> </span><span class="n">Memstore</span><span class="p">,</span><span class="w"> </span><span class="n">OVH</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"swift"</span><span class="w"></span>
<span class="mi">11</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Yandex</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"yandex"</span><span class="w"></span>
<span class="n">Storage</span><span class="o">></span><span class="w"> </span><span class="mi">7</span><span class="w"></span>
<span class="n">Hubic</span><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Id</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">leave</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="n">normally</span><span class="p">.</span><span class="w"></span>
<span class="n">client_id</span><span class="o">></span><span class="w"> </span>
<span class="n">Hubic</span><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Secret</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">leave</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="n">normally</span><span class="p">.</span><span class="w"></span>
<span class="n">client_secret</span><span class="o">></span><span class="w"> </span>
<span class="n">Remote</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="k">Use</span><span class="w"> </span><span class="n">auto</span><span class="w"> </span><span class="n">config</span><span class="vm">?</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Say</span><span class="w"> </span><span class="n">Y</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">sure</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Say</span><span class="w"> </span><span class="n">N</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="k">are</span><span class="w"> </span><span class="n">working</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">remote</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">headless</span><span class="w"> </span><span class="n">machine</span><span class="w"></span>
<span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="n">Yes</span><span class="w"></span>
<span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">No</span><span class="w"></span>
<span class="n">y</span><span class="o">/</span><span class="n">n</span><span class="o">></span><span class="w"> </span><span class="n">y</span><span class="w"></span>
<span class="k">If</span><span class="w"> </span><span class="n">your</span><span class="w"> </span><span class="n">browser</span><span class="w"> </span><span class="n">doesn</span><span class="err">'</span><span class="n">t</span><span class="w"> </span><span class="k">open</span><span class="w"> </span><span class="n">automatically</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">following</span><span class="w"> </span><span class="nl">link</span><span class="p">:</span><span class="w"> </span><span class="nl">http</span><span class="p">:</span><span class="o">//</span><span class="mf">127.0.0.1</span><span class="err">:</span><span class="mi">53682</span><span class="o">/</span><span class="n">auth</span><span class="w"></span>
<span class="nf">Log</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">authorize</span><span class="w"> </span><span class="n">rclone</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">access</span><span class="w"></span>
<span class="n">Waiting</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">code</span><span class="p">...</span><span class="w"></span>
<span class="n">Got</span><span class="w"> </span><span class="n">code</span><span class="w"></span>
<span class="o">--------------------</span><span class="w"></span>
<span class="o">[</span><span class="n">remote</span><span class="o">]</span><span class="w"></span>
<span class="n">client_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="n">client_secret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="ss">"access_token"</span><span class="err">:</span><span class="ss">"XXXXXX"</span><span class="err">}</span><span class="w"></span>
<span class="o">--------------------</span><span class="w"></span>
<span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="n">Yes</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">OK</span><span class="w"></span>
<span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="n">Edit</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">d</span><span class="p">)</span><span class="w"> </span><span class="k">Delete</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">y</span><span class="o">/</span><span class="n">e</span><span class="o">/</span><span class="n">d</span><span class="o">></span><span class="w"> </span><span class="n">y</span><span class="w"></span>
</code></pre></div>
<p>With that setup, I went into my <code>~/documents</code> annex and added the remote.</p>
<div class="highlight"><pre><span></span><code>$ git annex initremote hubic <span class="nv">type</span><span class="o">=</span>external <span class="nv">externaltype</span><span class="o">=</span>rclone <span class="nv">target</span><span class="o">=</span>hubic-annex <span class="nv">prefix</span><span class="o">=</span>annex-documents <span class="nv">chunk</span><span class="o">=</span>50MiB <span class="nv">encryption</span><span class="o">=</span>shared <span class="nv">rclone_layout</span><span class="o">=</span>lower <span class="nv">mac</span><span class="o">=</span>HMACSHA512
</code></pre></div>
<p>I want git-annex to automatically send everything to Hubic, so I took advantage of <a href="https://git-annex.branchable.com/preferred_content/standard_groups/">standard groups</a> and put the repository in the <code>backup</code> group.</p>
<div class="highlight"><pre><span></span><code>$ git annex wanted hubic standard
$ git annex group hubic backup
</code></pre></div>
<p>Given Hubic’s slow speed, I don’t really want to download files from it unless I need to. This can be configured in git-annex by setting the cost of the remote. Local repositories default to 100 and remote repositories default to 200. I gave the Hubic remote a high cost so that it will only be used if no other remotes are available.</p>
<div class="highlight"><pre><span></span><code>$ git config remote.hubic.annex-cost <span class="m">500</span>
</code></pre></div>
<p>If you would like to try Hubic, I have a <a href="https://hubic.com/home/new/?referral=FATDIA">referral code</a> which gives us both an extra 5GB for free.</p>
<h2>Backblaze B2</h2>
<p><a href="https://www.backblaze.com/b2/cloud-storage.html">B2</a> is the cloud storage offering from backup company <a href="https://www.backblaze.com/">Backblaze</a>. I don’t know anything about them, but at $0.005 per GB I like their <a href="https://www.backblaze.com/b2/cloud-storage-providers.html">pricing</a>. A quick search of reviews shows that the main complaint about the service is that they offer no geographic redundancy, which is entirely irrelevant to me since I build my own redundancy with my half-dozen or so remotes per repository.</p>
<p>Signing up with Backblaze took a bit longer. They wanted a phone number for 2-factor authentication, I wanted to give them a credit card so that I could use more than the 10GB they offer for free, and I had to generate an application key to use with rclone. After that, the <a href="http://rclone.org/b2/">rclone setup</a> was simple.</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">rclone</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">New</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="k">Set</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="n">password</span><span class="w"></span>
<span class="n">q</span><span class="p">)</span><span class="w"> </span><span class="n">Quit</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="o">/</span><span class="n">s</span><span class="o">/</span><span class="n">q</span><span class="o">></span><span class="w"> </span><span class="n">n</span><span class="w"></span>
<span class="n">name</span><span class="o">></span><span class="w"> </span><span class="n">b2</span><span class="o">-</span><span class="n">annex</span><span class="w"></span>
<span class="n">Type</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">configure</span><span class="p">.</span><span class="w"></span>
<span class="nf">Choose</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">below</span><span class="p">,</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">your</span><span class="w"> </span><span class="n">own</span><span class="w"> </span><span class="k">value</span><span class="w"></span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"amazon cloud drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">S3</span><span class="w"> </span><span class="p">(</span><span class="n">also</span><span class="w"> </span><span class="n">Dreamhost</span><span class="p">,</span><span class="w"> </span><span class="n">Ceph</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"s3"</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Backblaze</span><span class="w"> </span><span class="n">B2</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"b2"</span><span class="w"></span>
<span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Dropbox</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"dropbox"</span><span class="w"></span>
<span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"google cloud storage"</span><span class="w"></span>
<span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Hubic</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"hubic"</span><span class="w"></span>
<span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">Local</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"local"</span><span class="w"></span>
<span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Microsoft</span><span class="w"> </span><span class="n">OneDrive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"onedrive"</span><span class="w"></span>
<span class="mi">10</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Openstack</span><span class="w"> </span><span class="n">Swift</span><span class="w"> </span><span class="p">(</span><span class="n">Rackspace</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Files</span><span class="p">,</span><span class="w"> </span><span class="n">Memset</span><span class="w"> </span><span class="n">Memstore</span><span class="p">,</span><span class="w"> </span><span class="n">OVH</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"swift"</span><span class="w"></span>
<span class="mi">11</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Yandex</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"yandex"</span><span class="w"></span>
<span class="n">Storage</span><span class="o">></span><span class="w"> </span><span class="mi">3</span><span class="w"></span>
<span class="n">Account</span><span class="w"> </span><span class="n">ID</span><span class="w"></span>
<span class="n">account</span><span class="o">></span><span class="w"> </span><span class="mi">123456789</span><span class="n">abc</span><span class="w"></span>
<span class="n">Application</span><span class="w"> </span><span class="k">Key</span><span class="w"></span>
<span class="k">key</span><span class="o">></span><span class="w"> </span><span class="mi">0123456789</span><span class="n">abcdef0123456789abcdef0123456789</span><span class="w"></span>
<span class="n">Endpoint</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">service</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">leave</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="n">normally</span><span class="p">.</span><span class="w"></span>
<span class="n">endpoint</span><span class="o">></span><span class="w"> </span>
<span class="n">Remote</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="o">--------------------</span><span class="w"></span>
<span class="o">[</span><span class="n">remote</span><span class="o">]</span><span class="w"></span>
<span class="n">account</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">123456789</span><span class="n">abc</span><span class="w"></span>
<span class="k">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0123456789</span><span class="n">abcdef0123456789abcdef0123456789</span><span class="w"></span>
<span class="n">endpoint</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="o">--------------------</span><span class="w"></span>
<span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="n">Yes</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">OK</span><span class="w"></span>
<span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="n">Edit</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">d</span><span class="p">)</span><span class="w"> </span><span class="k">Delete</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">y</span><span class="o">/</span><span class="n">e</span><span class="o">/</span><span class="n">d</span><span class="o">></span><span class="w"> </span><span class="n">y</span><span class="w"></span>
</code></pre></div>
<p>With that, it was back to <code>~/documents</code> to initialize the remote and send it all the things</p>
<div class="highlight"><pre><span></span><code>$ git annex initremote b2 <span class="nv">type</span><span class="o">=</span>external <span class="nv">externaltype</span><span class="o">=</span>rclone <span class="nv">target</span><span class="o">=</span>b2-annex <span class="nv">prefix</span><span class="o">=</span>annex-documents <span class="nv">chunk</span><span class="o">=</span>50MiB <span class="nv">encryption</span><span class="o">=</span>shared <span class="nv">rclone_layout</span><span class="o">=</span>lower <span class="nv">mac</span><span class="o">=</span>HMACSHA512
$ git annex wanted b2 standard
$ git annex group b2 backup
</code></pre></div>
<p>While I did not measure the speed with B2, it feels as fast as my <a href="https://aws.amazon.com/s3/">S3</a> or <a href="http://www.rsync.net/products/git-annex-pricing.html">rsync.net</a> remotes, so I didn’t bother setting the cost.</p>
<h2>Google Drive</h2>
<p>While I do not regularly use Google services for personal things, I do have a Google account for Android stuff. Google Drive offers <a href="https://support.google.com/drive/answer/2375123?hl=en">15 GB of storage for free</a> and <a href="http://rclone.org/drive/">rclone supports it</a>, so why not take advantage?</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">rclone</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="k">New</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="k">Set</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="n">password</span><span class="w"></span>
<span class="n">q</span><span class="p">)</span><span class="w"> </span><span class="n">Quit</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="n">n</span><span class="o">/</span><span class="n">s</span><span class="o">/</span><span class="n">q</span><span class="o">></span><span class="w"> </span><span class="n">n</span><span class="w"></span>
<span class="n">name</span><span class="o">></span><span class="w"> </span><span class="n">gdrive</span><span class="o">-</span><span class="n">annex</span><span class="w"></span>
<span class="n">Type</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">configure</span><span class="p">.</span><span class="w"></span>
<span class="nf">Choose</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">below</span><span class="p">,</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">your</span><span class="w"> </span><span class="n">own</span><span class="w"> </span><span class="k">value</span><span class="w"></span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"amazon cloud drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Amazon</span><span class="w"> </span><span class="n">S3</span><span class="w"> </span><span class="p">(</span><span class="n">also</span><span class="w"> </span><span class="n">Dreamhost</span><span class="p">,</span><span class="w"> </span><span class="n">Ceph</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"s3"</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Backblaze</span><span class="w"> </span><span class="n">B2</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"b2"</span><span class="w"></span>
<span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Dropbox</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"dropbox"</span><span class="w"></span>
<span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"google cloud storage"</span><span class="w"></span>
<span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Google</span><span class="w"> </span><span class="n">Drive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"drive"</span><span class="w"></span>
<span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Hubic</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"hubic"</span><span class="w"></span>
<span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="k">Local</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"local"</span><span class="w"></span>
<span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Microsoft</span><span class="w"> </span><span class="n">OneDrive</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"onedrive"</span><span class="w"></span>
<span class="mi">10</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Openstack</span><span class="w"> </span><span class="n">Swift</span><span class="w"> </span><span class="p">(</span><span class="n">Rackspace</span><span class="w"> </span><span class="n">Cloud</span><span class="w"> </span><span class="n">Files</span><span class="p">,</span><span class="w"> </span><span class="n">Memset</span><span class="w"> </span><span class="n">Memstore</span><span class="p">,</span><span class="w"> </span><span class="n">OVH</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"swift"</span><span class="w"></span>
<span class="mi">11</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Yandex</span><span class="w"> </span><span class="k">Disk</span><span class="w"></span>
<span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="ss">"yandex"</span><span class="w"></span>
<span class="n">Storage</span><span class="o">></span><span class="w"> </span><span class="mi">6</span><span class="w"></span>
<span class="n">Google</span><span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Id</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">leave</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="n">normally</span><span class="p">.</span><span class="w"></span>
<span class="n">client_id</span><span class="o">></span><span class="w"> </span>
<span class="n">Google</span><span class="w"> </span><span class="n">Application</span><span class="w"> </span><span class="n">Client</span><span class="w"> </span><span class="n">Secret</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">leave</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="n">normally</span><span class="p">.</span><span class="w"></span>
<span class="n">client_secret</span><span class="o">></span><span class="w"> </span>
<span class="n">Remote</span><span class="w"> </span><span class="n">config</span><span class="w"></span>
<span class="k">Use</span><span class="w"> </span><span class="n">auto</span><span class="w"> </span><span class="n">config</span><span class="vm">?</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Say</span><span class="w"> </span><span class="n">Y</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">sure</span><span class="w"></span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Say</span><span class="w"> </span><span class="n">N</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">you</span><span class="w"> </span><span class="k">are</span><span class="w"> </span><span class="n">working</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">remote</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">headless</span><span class="w"> </span><span class="n">machine</span><span class="w"> </span><span class="ow">or</span><span class="w"> </span><span class="n">Y</span><span class="w"> </span><span class="n">didn</span><span class="s1">'t work</span>
<span class="s1">y) Yes</span>
<span class="s1">n) No</span>
<span class="s1">y/n> y</span>
<span class="s1">If your browser doesn'</span><span class="n">t</span><span class="w"> </span><span class="k">open</span><span class="w"> </span><span class="n">automatically</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">following</span><span class="w"> </span><span class="nl">link</span><span class="p">:</span><span class="w"> </span><span class="nl">http</span><span class="p">:</span><span class="o">//</span><span class="mf">127.0.0.1</span><span class="err">:</span><span class="mi">53682</span><span class="o">/</span><span class="n">auth</span><span class="w"></span>
<span class="nf">Log</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">authorize</span><span class="w"> </span><span class="n">rclone</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">access</span><span class="w"></span>
<span class="n">Waiting</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">code</span><span class="p">...</span><span class="w"></span>
<span class="n">Got</span><span class="w"> </span><span class="n">code</span><span class="w"></span>
<span class="o">--------------------</span><span class="w"></span>
<span class="o">[</span><span class="n">remote</span><span class="o">]</span><span class="w"></span>
<span class="n">client_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="n">client_secret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>
<span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="ss">"AccessToken"</span><span class="err">:</span><span class="ss">"xxxx.x.xxxxx_xxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"</span><span class="p">,</span><span class="ss">"RefreshToken"</span><span class="err">:</span><span class="ss">"1/xxxxxxxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxx"</span><span class="p">,</span><span class="ss">"Expiry"</span><span class="err">:</span><span class="ss">"2014-03-16T13:57:58.955387075Z"</span><span class="p">,</span><span class="ss">"Extra"</span><span class="err">:</span><span class="k">null</span><span class="err">}</span><span class="w"></span>
<span class="o">--------------------</span><span class="w"></span>
<span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="n">Yes</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">OK</span><span class="w"></span>
<span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="n">Edit</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">d</span><span class="p">)</span><span class="w"> </span><span class="k">Delete</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="n">remote</span><span class="w"></span>
<span class="n">y</span><span class="o">/</span><span class="n">e</span><span class="o">/</span><span class="n">d</span><span class="o">></span><span class="w"> </span><span class="n">y</span><span class="w"></span>
</code></pre></div>
<p>And again, to <code>~/documents</code>.</p>
<div class="highlight"><pre><span></span><code>$ git annex initremote gdrive <span class="nv">type</span><span class="o">=</span>external <span class="nv">externaltype</span><span class="o">=</span>rclone <span class="nv">target</span><span class="o">=</span>gdrive-annex <span class="nv">prefix</span><span class="o">=</span>annex-documents <span class="nv">chunk</span><span class="o">=</span>50MiB <span class="nv">encryption</span><span class="o">=</span>shared <span class="nv">rclone_layout</span><span class="o">=</span>lower <span class="nv">mac</span><span class="o">=</span>HMACSHA512
$ git annex wanted gdrive standard
$ git annex group gdrive backup
</code></pre></div>
<p>Rinse and repeat the process for other annexes. Revel in having simple, secure, and redundant storage.</p>I celebrated World Backup Day by increasing the resiliency of data in my life.2016-03-31T00:00:00-07:002016-08-19T20:03:18-07:00Pig Monkeytag:pig-monkey.com,2016-03-31:/2016/03/backup/<p>Four <a href="https://wiki.archlinux.org/index.php/Dm-crypt">encrypted</a> 2TB hard drives, stored in a <a href="http://www.pelican.com/us/en/product/watertight-protector-hard-cases/small-case/standard/1200/">Pelican 1200</a>, with <a href="https://securitysnobs.com/Abloy-Protec2-PL-321-Padlock.html">Abloy Protec2 PL 321</a> padlocks as tamper-evident seals. Having everything that matters stored in <a href="https://git-annex.branchable.com/">git-annex</a> makes projects like this simple: just clone the repositories, define the <a href="https://git-annex.branchable.com/preferred_content/">preferred content expressions</a>, and watch the magic happen.</p>
<p><a href="https://www.flickr.com/photos/pigmonkey/25889491200/in/dateposted/" title="Cold Storage"><img src="https://farm2.staticflickr.com/1624/25889491200_7b962ddfd0_c.jpg" width="800" height="450" alt="Cold Storage"></a></p>Optical Backups of Photo Archives2013-05-29T00:00:00-07:002013-05-29T00:00:00-07:00Pig Monkeytag:pig-monkey.com,2013-05-29:/2013/05/optical-photo-backups/<p>I store my photos in <a href="http://git-annex.branchable.com/">git-annex</a>. A full copy of the annex exists on my laptop and on an external drive. Encrypted copies of all of my photos are stored on <a href="https://aws.amazon.com/s3/">Amazon S3</a> (which I pay for) and <a href="https://www.box.com/">box.com</a> (which provides 50GB for free) via git-annex <a href="http://git-annex.branchable.com/special_remotes/">special remotes</a>. The …</p><p>I store my photos in <a href="http://git-annex.branchable.com/">git-annex</a>. A full copy of the annex exists on my laptop and on an external drive. Encrypted copies of all of my photos are stored on <a href="https://aws.amazon.com/s3/">Amazon S3</a> (which I pay for) and <a href="https://www.box.com/">box.com</a> (which provides 50GB for free) via git-annex <a href="http://git-annex.branchable.com/special_remotes/">special remotes</a>. The photos are backed-up to an external drive daily with the rest of my laptop hard drive via <a href="/2012/10/back-it-up/">backitup.sh</a> and <a href="/2012/09/cryptshot-automated-encrypted-backups-rsnapshot/">cryptshot</a>. My entire laptop hard drive is also mirrored monthly to an external drive stored off-site.</p>
<p>(The majority of my photos are also <a href="http://www.flickr.com/photos/pigmonkey/">on Flickr</a>, but I don’t consider that a backup or even reliable storage.)</p>
<p>All of this is what I consider to be the bare minimum for any redundant data storage. Photos have special value, above the value that I assign to most other data. This value only increases with age. As such they require an additional backup method, but due to the size of my collection I want to avoid backup methods that involve paying for more online storage, such as <a href="/2012/09/tarsnapper-managing-tarsnap-backups/">Tarsnap</a>.</p>
<p>I choose optical discs as the medium for my photo backups. This has the advantage of being read-only, which makes it more difficult for accidental deletions or corruption to propagate through the backup system. DVD-Rs have a capacity of 4.7 GBs and a cost of around $0.25 per disc. Their life expectancy varies, but 10-years seem to be a reasonable low estimate.</p>
<h2>Preparation</h2>
<p>I keep all of my photos in year-based directories. At the beginning of every year, the previous year’s directory is burned to a DVD.</p>
<p>Certain years contain few enough photos that the entire year can fit on a single DVD. More recent years have enough photos of a high enough resolution that they require multiple DVDs.</p>
<h3>Archive</h3>
<p>My first step is to build a compressed archive of each year. I choose <a href="http://www.gnu.org/software/tar/">tar</a> and <a href="http://en.wikipedia.org/wiki/Bzip2">bzip2</a> compression for this because they’re simple and reliable.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>$ <span class="nb">cd</span> ~/pictures
$ tar cjhf ~/tmp/pictures/2012.tar.bz <span class="m">2012</span>
</code></pre></div></td></tr></table></div>
<p>If the archive is larger than 3.7 GB, it needs to be split into multiple files. The resulting files will be burned to different discs. The capacity of a DVD is 4.7 GB, but I place the upper file limit at 3.7 GB so that the DVD has a minimum of 20% of its capacity available. This will be filled with parity information later on for redundancy.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>$ split -d -b 3700M <span class="m">2012</span>.tar.bz <span class="m">2012</span>.tar.bz.
</code></pre></div></td></tr></table></div>
<h3>Encrypt</h3>
<p>Leaving unencrypted data around is <a href="http://www.youtube.com/watch?v=OwHrlM4oVSI">bad form</a>. The archive (or each of the files resulting from splitting the large archive) is next encrypted and signed with <a href="http://www.gnupg.org/">GnuPG</a>.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>$ gpg -eo <span class="m">2012</span>.tar.bz.gpg <span class="m">2012</span>.tar.bz
$ gpg -bo <span class="m">2012</span>.tar.bz.gpg.sig <span class="m">2012</span>.tar.bz.gpg
</code></pre></div></td></tr></table></div>
<h2>Imaging</h2>
<p>The encrypted archive and the detached signature of the encrypted archive are what will be burned to the disc. (Or, in the case of a large archive, the encrypted splits of the full archive and the associated signatures will be burned to one disc per split/signature combonation.) Rather than burning them directly, an image is created first.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>$ mkisofs -V <span class="s2">"Photos: 2012 1/1"</span> -r -o <span class="m">2012</span>.iso <span class="m">2012</span>.tar.bz.gpg <span class="m">2012</span>.tar.bz.gpg.sig
</code></pre></div></td></tr></table></div>
<p>If the year has a split archive requiring multiple discs, I modify the sequence number in the volume label. For example, a year requiring 3 discs will have the label <code>Photos: 2012 1/3</code>.</p>
<h3>Parity</h3>
<p>When I began this project I knew that I wanted some sort of parity information for each disc so that I could potentially recover data from slightly damaged media. My initial idea was to use <a href="http://en.wikipedia.org/wiki/Parchive">parchive</a> via <a href="https://github.com/BlackIkeEagle/par2cmdline">par2cmdline</a>. Further research led me to <a href="http://dvdisaster.net/en/index.html">dvdisaster</a> which, despite being a GUI-only program, seemed more appropriate for this use case.</p>
<p>Both dvdisaster and parchive use the same <a href="http://en.wikipedia.org/wiki/Reed–Solomon_error_correction">Reed–Solomon error correction codes</a>. Dvdidaster is aimed at optical media and has the ability to place the error correction data on the disc by <a href="http://dvdisaster.net/en/howtos30.html">augmenting the disc image</a>, as well as <a href="http://dvdisaster.net/en/howtos20.html">storing the data separately</a>. It can also <a href="http://dvdisaster.net/en/howtos10.html">scan media for errors</a> and assist in judging when the media is in danger of becoming defective. This makes it an attractive option for long-term storage.</p>
<p>I use dvdisaster with the <a href="http://dvdisaster.net/en/howtos32.html">RS02</a> error correction method, which augments the image before burning. Depending on the size of the original image, this will result in the disc having anywhere from 20% to 200% redundancy.</p>
<h3>Verify</h3>
<p>After the image has been augmented, I mount it and verify the signature of the encrypted file on the disc against the local copy of the signature. I’ve never had the signatures not match, but performing this step makes me feel better.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>$ sudo mount -o loop <span class="m">2012</span>.iso /mnt/disc
$ gpg --verify <span class="m">2012</span>.tar.bz.gpg.sig /mnt/disc/2012.tar.bz.gpg
$ sudo umount /mnt/disc
</code></pre></div></td></tr></table></div>
<h3>Burn</h3>
<p>The final step is to burn the augmented image. I always burn discs at low speeds to diminish the chance of errors during the process.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>$ cdrecord -v <span class="nv">speed</span><span class="o">=</span><span class="m">4</span> <span class="nv">dev</span><span class="o">=</span>/dev/sr0 <span class="m">2012</span>.iso
</code></pre></div></td></tr></table></div>
<p>Similar to the optical backups of my <a href="/2013/04/password-management-vim-gnupg/">password database</a>, I burn two copies of each disc. One copy is stored off-site. This provides a reasonably level of assurance against any loss of my photos.</p>Password Management with Vim and GnuPG2013-04-04T00:00:00-07:002013-06-30T00:00:00-07:00Pig Monkeytag:pig-monkey.com,2013-04-04:/2013/04/password-management-vim-gnupg/<p>The first password manager I ever used was a simple text file encrypted with <a href="http://www.gnupg.org/">GnuPG</a>. When I needed a password I would decrypt the file, read it in <a href="http://www.vim.org/">Vim</a>, and copy the required entry to the system clipboard. This system didn’t last. At the time I wasn’t using …</p><p>The first password manager I ever used was a simple text file encrypted with <a href="http://www.gnupg.org/">GnuPG</a>. When I needed a password I would decrypt the file, read it in <a href="http://www.vim.org/">Vim</a>, and copy the required entry to the system clipboard. This system didn’t last. At the time I wasn’t using GnuPG for much else, and this was in the very beginning of my Vim days, when the program seemed cumbersome and daunting. I shortly moved to other, purpose-built password managers.</p>
<p>After some experimentation I landed on <a href="http://www.keepassx.org/">KeePassX</a>, which I used for a number of years. Some time ago I decided that I wanted to move to a command-line solution. KeePassX and a web browser were the only graphical applications that I was using with any regularity. I could see no need for a password manager to have a graphical interface, and the GUI’s dependency on a mouse decreased my productivity. After a cursory look at the available choices I landed right back where I started all those years ago: Vim and GnuPG.</p>
<p>These days Vim is my most used program outside of a web browser and I use GnuPG daily for handling the majority of my encryption needs. My greater familiarity with both of these tools is one of the reasons I’ve been successful with the system this time around. I believe the other reason is my more systematic approach.</p>
<h2>Structure</h2>
<p>The power of this system comes from its simplicity: passwords are stored in plain text files that have been encrypted with GnuPG. Every platform out there has some implementation of the <a href="https://en.wikipedia.org/wiki/Pretty_Good_Privacy#OpenPGP">PGP protocol</a>, so the files can easily be decrypted anywhere. After they’ve been decrypted, there’s no fancy file formats to deal with. It’s all just text, which can be manipulated with a <a href="https://en.wikipedia.org/wiki/GNU_Core_Utilities">plethora of powerful tools</a>. I favor reading the text in Vim, but any text editor will do the job.</p>
<p>All passwords are stored within a directory called <code>~/pw</code>. Within this directory are multiple files. Each of these files can be thought of as a separate password database. I store bank information in <code>financial.gpg</code>. Login information for various shopping websites are in <code>ecommerce.gpg</code>. My email credentials are in <code>email.gpg</code>. All of these entries could very well be stored in a single file, but breaking it out into multiple files allows me some measure of access control.</p>
<h3>Access</h3>
<p>I regularly use two computers: my laptop at home and a desktop machine at work. I trust my laptop. It has my GnuPG key on it and it should have access to all password database files. I do not place complete trust in my machine at work. I don’t trust it enough to give it access to my GnuPG key, and as such I have a different GnuPG key on that machine that I use for encryption at work.</p>
<p>Having passwords segregated into multiple database files allows me to encrypt the different files to different keys. Every file is encrypted to my primary GnuPG key, but only some are encrypted with my work key. Login credentials needed for work are encrypted to the work key. I have no need to login to my bank accounts at work, and it wouldn’t be prudent to do so on a machine that I do not fully trust, so the <code>financial.gpg</code> file is not encrypted to my work key. If someone compromises my work computer, they still will be no closer to accessing my banking credentials.</p>
<h3>Git</h3>
<p>The <code>~/pw</code> directory is a <a href="http://git-scm.com/">git</a> repository. This gives me version control on all of my passwords. If I accidentally delete an entry I can always get it back. It also provides syncing and redundant storage without depending on a third-party like Dropbox.</p>
<h3>Keys</h3>
<p>An advantage of using a directory full of encrypted files as my password manager is that I’m not limited to only storing usernames and passwords. Any file can be added to the repository. I keep keys for backups, SSH keys, and SSL keys (all of which have been encrypted with my GnuPG key) in the directory. This gives me one location for all of my authentication credentials, which simplifies the locating and backing up of these important files.</p>
<h2>Markup</h2>
<p>Each file is structured with <a href="http://vimdoc.sourceforge.net/htmldoc/fold.html">Vim folds</a> and indentation. There are various ways for Vim to fold text. I use markers, sticking with the default <code>{{{</code>/<code>}}}</code> characters. A typical password entry will look like this:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>Amazon{{{
user: foo@bar.com
pass: supers3cr3t
url: https://amazon.com
}}}
</code></pre></div></td></tr></table></div>
<p>Each file is full of entries like this. Certain entries are grouped together within other folds for organization. Certain entries may have comments so that I have a record of the false personally identifiable information the service requested when I registered.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><code>Super Ecommerce{{{
user: foobar
pass: g0d
Comments{{{
birthday: 1/1/1911
first car: delorean
}}}
}}}
</code></pre></div></td></tr></table></div>
<p>Following a consistent structure like this makes the file easier to navigate and allows for the possibility of the file being parsed by a script. The fold markers come into play with my Vim configuration.</p>
<h2>Vim</h2>
<p>I use Vim with the <a href="https://github.com/jamessan/vim-gnupg">vim-gnupg</a> plugin. This makes editing of encrypted files seamless. When opening existing files, the contents are decrypted. When opening new files, the plugin asks which recipients the file should be encrypted to. When a file is open, leaking the clear text is avoided by disabling <a href="http://vimdoc.sourceforge.net/htmldoc/starting.html#viminfo">viminfo</a>, <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27swapfile%27">swapfile</a>, and <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27undofile%27">undofile</a>. I run <code>gpg-agent</code> so that my passphrase is remembered for a short period of time after I use it. This makes it easy and secure to work with (and create) the encrypted files with Vim. I define a few extra options in my <a href="https://github.com/pigmonkey/dotfiles/blob/master/vimrc">vimrc</a> to facilitate working with passwords.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="c">""""""""""""""""""""</span>
<span class="c">" GnuPG Extensions "</span>
<span class="c">""""""""""""""""""""</span>
<span class="c">" Tell the GnuPG plugin to armor new files.</span>
<span class="k">let</span> <span class="k">g</span>:GPGPreferArmor<span class="p">=</span><span class="m">1</span>
<span class="c">" Tell the GnuPG plugin to sign new files.</span>
<span class="k">let</span> <span class="k">g</span>:GPGPreferSign<span class="p">=</span><span class="m">1</span>
augroup GnuPGExtra
<span class="c">" Set extra file options.</span>
autocmd <span class="nb">BufReadCmd</span><span class="p">,</span><span class="nb">FileReadCmd</span> *.\<span class="p">(</span>gpg\<span class="p">|</span><span class="k">asc</span>\<span class="p">|</span>pgp\<span class="p">)</span> <span class="k">call</span> SetGPGOptions<span class="p">()</span>
<span class="c">" Automatically close unmodified files after inactivity.</span>
autocmd <span class="nb">CursorHold</span> *.\<span class="p">(</span>gpg\<span class="p">|</span><span class="k">asc</span>\<span class="p">|</span>pgp\<span class="p">)</span> quit
augroup END
<span class="k">function</span> SetGPGOptions<span class="p">()</span>
<span class="c">" Set updatetime to 1 minute.</span>
<span class="k">set</span> <span class="nb">updatetime</span><span class="p">=</span><span class="m">60000</span>
<span class="c">" Fold at markers.</span>
<span class="k">set</span> <span class="nb">foldmethod</span><span class="p">=</span>marker
<span class="c">" Automatically close all folds.</span>
<span class="k">set</span> <span class="k">foldclose</span><span class="p">=</span><span class="k">all</span>
<span class="c">" Only open folds with insert commands.</span>
<span class="k">set</span> <span class="k">foldopen</span><span class="p">=</span>insert
<span class="k">endfunction</span>
</code></pre></div></td></tr></table></div>
<p>The first two options simply tell vim-gnupg to always ASCII-armor and sign new files. These have nothing particular to do with password management, but are good practices for all encrypted files.</p>
<p>The first <code>autocmd</code> calls a function which holds the options that I wanted applied to my password files. I have these options apply to all encrypted files, although they’re intended primarily for use when Vim is acting as my password manager.</p>
<h3>Folding</h3>
<p>The primary shortcoming with using an encrypted text file as a password database is the lack of protection against shoulder-surfing. After the file has been decrypted and opened, anyone standing behind you can look over your shoulder and view all the entries. This is solved with <a href="http://vim.wikia.com/wiki/Folding">folds</a> and is what most of these extra options address.</p>
<p>I set <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27foldmethod%27">foldmethod</a> to <code>marker</code> so that Vim knows to look for all the <code>{{{</code>/<code>}}}</code> characters and use them to build the folds. Then I set <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27foldclose%27">foldclose</a> to <code>all</code>. This closes all folds unless the cursor is in them. This way only one fold can be open at a time – or, to put it another way, only one password entry is ever visible at once.</p>
<p>The final fold option instructs Vim when it is allowed to open folds. Folds can always be opened manually, but by default Vim will also open them for many other cases: if you navigate to a fold, jump to a mark within a fold or search for a pattern within a fold, they will open. By setting <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27foldopen%27">foldopen</a> to <code>insert</code> I instruct Vim that the only time it should automatically open a fold is if my cursor is in a fold and I change to insert mode. The effect of this is that when I open a file, all folds are closed by default. I can navigate through the file, search and jump through matches, all without opening any of the folds and inadvertently exposing the passwords on my screen. The fold will open if I change to insert mode within it, but it is difficult to do that by mistake.</p>
<p>I have my <a href="https://github.com/pigmonkey/dotfiles/blob/master/vimrc#L116">spacebar setup to toggle folds</a> within Vim. After I have navigated to the desired entry, I can simply whack the spacebar to open it and copy the credential that I need to the system clipboard. At that point I can whack the spacebar again to close the fold, or I can quit Vim. Or I can simply wait.</p>
<h3>Locking</h3>
<p>The other special option I set is <a href="http://vimdoc.sourceforge.net/htmldoc/options.html#%27updatetime%27">updatetime</a>. Vim uses this option to determine when it should write swap files for crash recovery. Since vim-gnupg disables swap files for decrypted files, this has no effect. I use it for something else.</p>
<p>In the second <code>autocmd</code> I tell Vim to close itself on <a href="http://vimdoc.sourceforge.net/htmldoc/autocmd.html#CursorHold">CursorHold</a>. <code>CursorHold</code> is triggered whenever no key has been pressed for the time specified by <code>updatetime</code>. So the effect of this is that my password files are automatically closed after 1 minute of inactivity. This is similar to KeePassX’s behaviour of “locking the workspace” after a set period of inactivity.</p>
<h3>Clipboard</h3>
<p>To easily copy a credential to the system clipboard from Vim I have two <a href="https://github.com/pigmonkey/dotfiles/blob/master/vimrc#L175">shortcuts</a> mapped.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>" Yank WORD to system clipboard in normal mode
nmap <leader>y "+yE
" Yank selection to system clipboard in visual mode
vmap <leader>y "+y
</code></pre></div></td></tr></table></div>
<p>Vim can access the system clipboard using both the <code>*</code> and <code>+</code> registers. I opt to use <code>+</code> because <a href="http://vimdoc.sourceforge.net/htmldoc/gui_x11.html#x11-selection">X treats it as a selection rather than a cut-buffer</a>. As the Vim documentation explains:</p>
<blockquote>
<p>Selections are “owned” by an application, and disappear when that application (e.g., Vim) exits, thus losing the data, whereas cut-buffers, are stored within the X-server itself and remain until written over or the X-server exits (e.g., upon logging out).</p>
</blockquote>
<p>The result is that I can copy a username or password by placing the cursor on its first character and hitting <code><leader>y</code>. I can paste the credential wherever it is needed. After I close Vim, or after Vim closes itself after 1 minute of inactivity, the credential is removed from the clipboard. This replicates KeePassX’s behaviour of clearing the clipboard so many seconds after a username or password has been copied.</p>
<h2>Generation</h2>
<p>Passwords should be long and unique. To satisfy this any password manager needs some sort of password generator. Vim provides this with its ability to <a href="http://vim.wikia.com/wiki/Append_output_of_an_external_command.">call and read external commands</a> I can tell Vim to call the standard-issue <a href="http://linux.die.net/man/1/pwgen">pwgen</a> program to generate a secure 24-character password utilizing special characters and insert the output at the cursor, like this:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="p">:</span><span class="k">r</span><span class="p">!</span>pwgen <span class="p">-</span><span class="k">sy</span> <span class="m">24</span> <span class="m">1</span>
</code></pre></div></td></tr></table></div>
<h2>Backups</h2>
<p>The <code>~/pw</code> directory is backed up in the same way as most other things on my hard drive: to <a href="http://www.tarsnap.com/">Tarsnap</a> via <a href="/2012/09/tarsnapper-managing-tarsnap-backups/">Tarsnapper</a>, to an external drive via <a href="http://www.rsnapshot.org/">rsnapshot</a> and <a href="/2012/09/cryptshot-automated-encrypted-backups-rsnapshot/">cryptshot</a>, <a href="https://wiki.archlinux.org/index.php/Full_System_Backup_with_rsync">rsync to a mirror drive</a>. The issue with these standard backups is that they’re all encrypted and the keys to decrypt them are stored in the password manager. If I loose <code>~/pw</code> I’ll have plenty of backups around, but none that I can actually access. I address this problem with regular backups to optical media.</p>
<p>At the beginning of every month I burn the password directory to two CDs. One copy is stored at home and the other at an off-site location. I began these optical media backups in December, so I currently have two sets consisting of five discs each. Any one of these discs will provide me with the keys I need to access a backup made with one of the more frequent methods.</p>
<p>Of course, all the files being burned to these discs are still encrypted with my GnuPG key. If I loose that key or passphrase I will have no way to decrypt any of these files. Protecting one’s GnuPG key is another problem entirely. I’ve taken steps that make me feel confident in my ability to always be able to recover a copy of my key, but none that I’m comfortable discussing publicly.</p>
<h2>Shell</h2>
<p>I’ve defined a <a href="https://github.com/pigmonkey/dotfiles/blob/master/shellrc#L70">shell function</a>, <code>pw()</code>, that operates exactly like the function I use for <a href="/2012/12/notes-unix/">notes on Unix</a>.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="c1"># Set the password database directory.</span>
<span class="nv">PASSDIR</span><span class="o">=</span>~/pw
<span class="c1"># Create or edit password databases.</span>
pw<span class="o">()</span> <span class="o">{</span>
<span class="nb">cd</span> <span class="s2">"</span><span class="nv">$PASSDIR</span><span class="s2">"</span>
<span class="k">if</span> <span class="o">[</span> ! -z <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="nv">$EDITOR</span> <span class="k">$(</span>buildfile <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span><span class="k">)</span>
<span class="nb">cd</span> <span class="s2">"</span><span class="nv">$OLDPWD</span><span class="s2">"</span>
<span class="k">fi</span>
<span class="o">}</span>
</code></pre></div></td></tr></table></div>
<p>This allows me to easily open any password file from wherever I am in the filesystem without specifying the full path. These two commands are equivalent, but the one utilizing <code>pw()</code> requires fewer keystrokes:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>$ vim ~/pw/financial.gpg
$ pw financial
</code></pre></div></td></tr></table></div>
<p>The function changes to the password directory before opening the file so that while I’m in Vim I can drop down to a shell with <code>:sh</code> and already be in the proper directory to manipulate the files. After I close Vim the function returns me to the previous working directory.</p>
<p>This still required a few more keystrokes than I like, so I configured my shell to <a href="https://github.com/pigmonkey/dotfiles/blob/master/zshrc#L44">perform autocompletion in the directory</a>. If <code>financial.gpg</code> is the only file in the directory beginning with an “f”, typing <code>pw f<tab></code> is all that is required to open the file.</p>
<h2>Simplicity</h2>
<p>This setup provides <a href="https://wiki.archlinux.org/index.php/The_Arch_Way#Simplicity">simplicity</a>, power, and portability. It uses the same tools that I already employ in my daily life, and does not require the use of the mouse or any graphical windows. I’ve been happily utilizing it for about 6 months now.</p>
<p>Initially I had thought I would supplement the setup with a script that would search the databases for a desired entry, using some combination of <code>grep</code>, <code>awk</code> and <code>cut</code>, and then copy it to my clipboard via <code>xsel</code>. As it turns out, I haven’t felt the desire to do this. Simply opening the file in Vim, searching for the desired entry, opening the fold and copying the credential to the system clipboard is quick enough. The whole process, absent of typing in my passphrase, takes me only a couple of seconds.</p>
<h2>Resources</h2>
<p>I’m certainly not the first to come up with the idea of managing password with Vim. These resources were particularly useful to me when I was researching the possibilities:</p>
<ul>
<li><a href="http://connermcd.com/blog/2012/05/01/file-encryption-and-password-management/">File encryption and password management</a> by Conner McDaniel</li>
<li><a href="http://vim.wikia.com/wiki/Keep_passwords_in_encrypted_file">Keep passwords in encrypted file</a> on the Vim Wiki</li>
<li><a href="http://www.noah.org/wiki/Password_Safe_with_Vim_and_OpenSSL">Password Safe with Vim and OpenSSL</a> by Noah</li>
</ul>
<p>If you’re interesting in other ideas for password management, <a href="http://zx2c4.com/projects/password-store/">password-store</a> and <a href="http://raymontag.github.com/keepassc/">KeePassC</a> are both neat projects that I follow.</p>
<div class="notice">
<p>2013 June 30: <a href="http://blog.oddbit.com/">larsks</a> has hacked together a <a href="https://gist.github.com/larsks/5868076">Python script</a> to convert KeepassX XML exports to the plain-text markup format that I use.</p>
</div>Back It Up: A Solution for Laptop Backups2012-10-03T00:00:00-07:002012-12-22T00:00:00-08:00Pig Monkeytag:pig-monkey.com,2012-10-03:/2012/10/back-it-up/<p>A laptop presents some problems for reliably backing up data. Unlike a server, the laptop may not always be turned on. When it is on, it may not be connected to the backup medium. If you’re doing online backups, the laptop may be offline. If you’re backing up …</p><p>A laptop presents some problems for reliably backing up data. Unlike a server, the laptop may not always be turned on. When it is on, it may not be connected to the backup medium. If you’re doing online backups, the laptop may be offline. If you’re backing up to an external drive, the drive may not be plugged in. To address these issues I wrote a shell script called <a href="https://github.com/pigmonkey/backitup">backitup.sh</a>.</p>
<h2>The Problem</h2>
<p>Let’s say you want to backup a laptop to an external USB drive once per day with <a href="/2012/09/cryptshot-automated-encrypted-backups-rsnapshot/">cryptshot</a>.</p>
<p>You could add a cron entry to call <code>cryptshot.sh</code> at a certain time every day. What if the laptop isn’t turned on? What if the drive isn’t connected? In either case the backup will not be completed. The machine will then wait a full 24 hours before even attempting the backup again. This could easily result in weeks passing without a successful backup.</p>
<p>If you’re using <a href="http://en.wikipedia.org/wiki/Anacron">anacron</a>, or one of its derivatives, things get slightly better. Instead of specifying a time to call <code>cryptshot.sh</code>, you set the cron interval to <code>@daily</code>. If the machine is turned off at whatever time anacron is setup to execute <code>@daily</code> scripts, all of the commands will simply be executed the next time the machine boots. But that still doesn’t solve the problem of the drive not being plugged in.</p>
<h2>The Solution</h2>
<p><code>backitup.sh</code> attempts to perform a backup if a certain amount of time has passed. It monitors for a report of successful completion of the backup. Once configured, you no longer call the backup program directly. Instead, you call <code>backitup.sh</code>. It then decides whether or not to actually execute the backup.</p>
<h3>How it works</h3>
<p>The script is configured with the backup program that should be executed, the period for which you want to complete backups, and the location of a file that holds the timestamp of the last successful backup. It can be configured either by modifying the variables at the top of the script, or by passing in command-line arguments.</p>
<div class="highlight"><pre><span></span><code>$ backitup.sh -h
Usage: backitup.sh <span class="o">[</span>OPTION...<span class="o">]</span>
Note that any <span class="nb">command</span> line arguments overwrite variables defined <span class="k">in</span> the source.
Options:
-p the period <span class="k">for</span> which backups should attempt to be executed
<span class="o">(</span>integer seconds or <span class="s1">'DAILY'</span>, <span class="s1">'WEEKLY'</span> or <span class="s1">'MONTHLY'</span><span class="o">)</span>
-b the backup <span class="nb">command</span> to execute<span class="p">;</span> note that this should be quoted <span class="k">if</span> it contains a space
-l the location of the file that holds the timestamp of the last successful backup.
-n the <span class="nb">command</span> to be executed <span class="k">if</span> the above file does not exist
</code></pre></div>
<p>When the script executes, it reads the timestamp contained in the last-run file. This is then compared to the user-specified period. If the difference between the timestamp and the current time is greater than the period, <code>backitup.sh</code> calls the backup program. If the difference between the stored timestamp and the current time is less than the requested period, the script simply exits without running the backup program.</p>
<p>After the backup program completes, the script looks at the returned <a href="https://en.wikipedia.org/wiki/Exit_status">exit code</a>. If the exit code is 0, the backup was completed successfully, and the timestamp in the last-run file is replaced with the current time. If the backup program returns a non-zero exit code, no changes are made to the last-run file. In this case, the result is that the next time <code>backitup.sh</code> is called it will once again attempt to execute the backup program.</p>
<p>The period can either be specified in seconds or with the strings <code>DAILY</code>, <code>WEEKLY</code> or <code>MONTHLY</code>. The behaviour of <code>DAILY</code> differs from <code>86400</code> (24-hours in seconds). With the latter configuration, the backup program will only attempt to execute once per 24-hour period. If <code>DAILY</code> is specified, the backup may be completed successfully at, for example, 23:30 one day and again at 00:15 the following day.</p>
<h2>Use</h2>
<p>You still want to backup a laptop to an external USB drive once per day with cryptshot. Rather than calling <code>cryptshot.sh</code>, you call <code>backitup.sh</code>.</p>
<p>Tell the script that you wish to complete daily backups, and then use cron to call the script more frequently than the desired backup period. For my local backups, I call <code>backitup.sh</code> every hour.</p>
<div class="highlight"><pre><span></span><code><span class="nv">@hourly</span><span class="w"> </span><span class="n">backitup</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="o">-</span><span class="n">l</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">cryptshot</span><span class="o">-</span><span class="n">daily</span><span class="w"> </span><span class="o">-</span><span class="n">b</span><span class="w"> </span><span class="ss">"cryptshot.sh daily"</span><span class="w"></span>
</code></pre></div>
<p>The default period of <code>backitup.sh</code> is <code>DAILY</code>, so in this case I don’t have to provide a period of my own. But I also do weekly and monthly backups, so I need two more entries to execute cryptshot with those periods.</p>
<div class="highlight"><pre><span></span><code><span class="nv">@hourly</span><span class="w"> </span><span class="n">backitup</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="o">-</span><span class="n">l</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">cryptshot</span><span class="o">-</span><span class="n">monthly</span><span class="w"> </span><span class="o">-</span><span class="n">b</span><span class="w"> </span><span class="ss">"cryptshot.sh monthly"</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="n">MONTHLY</span><span class="w"></span>
<span class="nv">@hourly</span><span class="w"> </span><span class="n">backitup</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="o">-</span><span class="n">l</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">cryptshot</span><span class="o">-</span><span class="n">weekly</span><span class="w"> </span><span class="o">-</span><span class="n">b</span><span class="w"> </span><span class="ss">"cryptshot.sh weekly"</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="n">WEEKLY</span><span class="w"></span>
</code></pre></div>
<p>All three of these entries are executed hourly, which means that at the top of every hour, my laptop attempts to back itself up. As long as the USB drive is plugged in during one of those hours, the backup will complete. If cryptshot is executed, but fails, another attempt will be made the next hour. Daily backups will only be successfully completed, at most, once per day; weekly backups, once per week; and monthly backups, once per month. This setup works well for me, but if you want a higher assurance that your daily backups will be completed every day you could change the cron interval to <code>*/5 * * * *</code>, which will result in cron executing <code>backitup.sh</code> every 5 minutes.</p>
<p>What if you want to perform daily online backups with <a href="/2012/09/tarsnapper-managing-tarsnap-backups/">Tarsnapper</a>?</p>
<div class="highlight"><pre><span></span><code><span class="nv">@hourly</span><span class="w"> </span><span class="n">backitup</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="o">-</span><span class="n">l</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">tarsnapper</span><span class="o">-</span><span class="n">lastrun</span><span class="w"> </span><span class="o">-</span><span class="n">b</span><span class="w"> </span><span class="n">tarsnapper</span><span class="p">.</span><span class="n">py</span><span class="w"></span>
</code></pre></div>
<p>At the top of every hour your laptop will attempt to run <a href="http://www.tarsnap.com/">Tarsnap</a> via Tarsnapper. If the laptop is offline, it will try again the following hour. If Tarsnap begins but you go offline before it can complete, the backup will be resumed the following hour.</p>
<p>The script can of course be called with something other than cron. Put it in your <code>~/.profile</code> and have you backups attempt to execute every time you login. Add it to your network manager and have your online backups attempt to execute every time you get online. If you’re using something like <a href="https://en.wikipedia.org/wiki/Udev">udev</a>, have your local backups attempt to execute every time your USB drive is plugged in.</p>
<h2>The Special Case</h2>
<p>The final configuration option of <code>backitup.sh</code> represents a special case. If the script runs and it can’t find the specified file, the default behaviour is to assume that this is the first time it has ever run: it creates the file and executes the backup. That is what most users will want, but this behaviour can be changed.</p>
<p>When I first wrote <code>backitup.sh</code> it was to help manage backups of my <a href="https://www.dropbox.com/">Dropbox</a> folder. Dropbox doesn’t provide support client-side encryption, which means users need to handle encryption themselves. The most common way to do this is to create an <a href="http://www.arg0.net/encfs">encfs</a> file-system or two and place those within the Dropbox directory. That’s the way I use Dropbox.</p>
<p>I wanted to backup all the data stored in Dropbox with Tarsnap. Unlike Dropbox, Tarsnap <em>does</em> do client-side encryption, so when I backup my Dropbox folder, I don’t want to actually backup the encrypted contents of the folder – I want to backup the decrypted contents. That allows me to take better advantage of Tarsnap’s deduplication and it makes restoring backups much simpler. Rather than comparing <a href="https://en.wikipedia.org/wiki/Inode">inodes</a> and restoring a file using an encrypted filename like <code>6,8xHZgiIGN0vbDTBGw6w3lf/1nvj1,SSuiYY0qoYh-of5YX8</code> I can just restore <code>documents/todo.txt</code>.</p>
<p>If my encfs filesystem mount point is <code>~/documents</code>, I can configure Tarsnapper to create an archive of that directory, but if for some reason the filesystem is not mounted when Tarsnapper is called, I would be making a backup of an empty directory. That’s a waste of time. The solution is to tell <code>backitup.sh</code> to put the last-run file <em>inside</em> the encfs filesystem. If it can’t find the file, that means that the filesystem isn’t mounted. If that’s the case, I tell it to call the script I use to automatically mount the encfs filesystem (which, the way I have it setup, requires no interaction from me).</p>
<div class="highlight"><pre><span></span><code><span class="nv">@hourly</span><span class="w"> </span><span class="n">backitup</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="o">-</span><span class="n">l</span><span class="w"> </span><span class="o">~/</span><span class="n">documents</span><span class="o">/</span><span class="p">.</span><span class="n">lastrun</span><span class="w"> </span><span class="o">-</span><span class="n">b</span><span class="w"> </span><span class="n">tarsnapper</span><span class="p">.</span><span class="n">py</span><span class="w"> </span><span class="o">-</span><span class="n">n</span><span class="w"> </span><span class="o">~/</span><span class="n">bin</span><span class="o">/</span><span class="n">encfs_automount</span><span class="p">.</span><span class="n">sh</span><span class="w"></span>
</code></pre></div>
<h2>Problem Solved</h2>
<p><code>backitup.sh</code> solves all of my backup scheduling problems. I only call backup programs directly if I want to make an on-demand backup. All of my automated backups go through <code>backitup.sh</code>. If you’re interested in the script, you can <a href="https://github.com/pigmonkey/backitup">download it directly from GitHub</a>. You can clone my entire <a href="https://github.com/pigmonkey/backups">backups repository</a> if you’re also interested in the other scripts I’ve written to manage different aspects of backing up data.</p>
<p><a href="https://www.youtube.com/watch?v=F22yKJRZoZc&t=2m51s">Hey yo but wait, back it up, hup, easy back it up</a></p>Cryptshot: Automated, Encrypted Backups with rsnapshot2012-09-24T00:00:00-07:002012-12-22T00:00:00-08:00Pig Monkeytag:pig-monkey.com,2012-09-24:/2012/09/cryptshot-automated-encrypted-backups-rsnapshot/<p>Earlier this year I switched from <a href="http://duplicity.nongnu.org/">Duplicity</a> to <a href="http://rsnapshot.org/">rsnapshot</a> for my local backups. Duplicity uses a full + incremental backup schema: the first time a backup is executed, all files are copied to the backup medium. Successive backups copy only the deltas of changed objects. Over time this results in a …</p><p>Earlier this year I switched from <a href="http://duplicity.nongnu.org/">Duplicity</a> to <a href="http://rsnapshot.org/">rsnapshot</a> for my local backups. Duplicity uses a full + incremental backup schema: the first time a backup is executed, all files are copied to the backup medium. Successive backups copy only the deltas of changed objects. Over time this results in a chain of deltas that need to be replayed when restoring from a backup. If a single delta is somehow corrupted, the whole chain is broke. To minimize the chances of this happening, the common practice is to complete a new full backup every so often – I usually do a full backup every 3 or 4 weeks. Completing a full backup takes time when you’re backing up hundreds of gigabytes, even over USB 3.0. It also takes up disk space. I keep around two full backups when using Duplicity, which means I’m using a little over twice as much space on the backup medium as what I’m backing up.</p>
<p>The backup schema that rsnapshot uses is different. The first time it runs, it completes a full backup. Each time after that, it completes what could be considered a “full” backup, but unchanged files are not copied over. Instead, rsnapshot simply <a href="http://en.wikipedia.org/wiki/Hard_link">hard links</a> to the previously copied file. If you modify very large files regularly, this model may be inefficient, but for me – and I think for most users – it’s great. Backups are speedy, disk space usage on the backup medium isn’t too much more than the data being backed up, and I have multiple full backups that I can restore from.</p>
<p>The great strength of Duplicity – and the great weakness of rsnapshot – is encryption. Duplicity uses <a href="http://www.gnupg.org/">GnuPG</a> to encrypt backups, which makes it one of the few solutions appropriate for remote backups. In contrast, rsnapshot does no encryption. That makes it completely inappropriate for remote backups, but the shortcoming can be worked around when backing up locally.</p>
<p>My local backups are done to an external, USB hard drive. Encrypting the drive is simple with <a href="http://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">LUKS</a> and <a href="http://en.wikipedia.org/wiki/Dm-crypt">dm-crypt</a>. For example, to encrypt <code>/dev/sdb</code>:</p>
<div class="highlight"><pre><span></span><code>$ cryptsetup --cipher aes-xts-plain --key-size <span class="m">512</span> --verify-passphrase luksFormat /dev/sdb
</code></pre></div>
<p>The device can then be opened, formatted, and mounted.</p>
<div class="highlight"><pre><span></span><code>$ cryptsetup luksOpen /dev/sdb backup_drive
$ mkfs.ext4 -L backup /dev/mapper/backup_drive
$ mount /dev/mapper/backup_drive /mnt/backup/
</code></pre></div>
<p>At this point, the drive will be encrypted with a passphrase. To make it easier to mount programatically, I also add a key file full of some random data generated from <code>/dev/urandom</code>.</p>
<div class="highlight"><pre><span></span><code>$ dd <span class="k">if</span><span class="o">=</span>/dev/urandom <span class="nv">of</span><span class="o">=</span>/root/supersecretkey <span class="nv">bs</span><span class="o">=</span><span class="m">1024</span> <span class="nv">count</span><span class="o">=</span><span class="m">8</span>
$ chmod <span class="m">0400</span> /root/supersecretkey
$ cryptsetup luksAddKey /dev/sdb /root/supersecretkey
</code></pre></div>
<p>There are still a few considerations to address before backups to this encrypted drive can be completed automatically with no user interaction. Since the target is a USB drive and the source is a laptop, there’s a good chance that the drive won’t be plugged in when the scheduler kicks in the backup program. If it is plugged in, the drive needs to be decrypted before calling rsnapshot to do its thing. I wrote a wrapper script called <a href="https://github.com/pigmonkey/cryptshot">cryptshot</a> to address these issues.</p>
<p>Cryptshot is configured with the <a href="http://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> of the target drive and the key file used to decrypt the drive. When it is executed, the first thing it does is look to see if the UUID exists. If it does, that means the drive is plugged in and accessible. The script then decrypts the drive with the specified key file and mounts it. Finally, rsnapshot is called to execute the backup as usual. Any argument passed to cryptshot is passed along to rsnapshot. What that means is that cryptshot becomes a drop-in replacement for encrypted, rsnapshot backups. Where I previously called <code>rsnapshot daily</code>, I now call <code>cryptshot daily</code>. Everything after that point just works, with no interaction needed from me.</p>
<p>If you’re interested in cryptshot, you can <a href="https://github.com/pigmonkey/cryptshot">download it directly from GitHub</a>. The script could easily be modified to execute a backup program other than rsnapshot. You can clone my entire <a href="https://github.com/pigmonkey/backups">backups repository</a> if you’re also interested in the other scripts I’ve written to manage different aspects of backing up data.</p>