You are currently viewing all posts tagged with linux.

Cold Storage

This past spring I mentioned my cold storage setup: a number of encrypted 2.5” drives in external enclosures, stored inside a Pelican 1200 case, secured with Abloy Protec2 321 locks. Offline, secure, and infrequently accessed storage is an important component of any strategy for resilient data. The ease with which this can be managed with git-annex only increases my infatuation with the software.

Data Data Data Data Data

I’ve been happy with the Seagate ST2000LM003 drives for this application. Unfortunately the enclosures I first purchased did not work out so well. I had two die within a few weeks. They’ve been replaced with the SIG JU-SA0Q12-S1. These claim to be compatible with drives up to 8TB (someday I’ll be able to buy 8TB 2.5” drives) and support USB 3.1. They’re also a bit thinner than the previous enclosures, so I can easily fit five in my box. The Seagate drives offer about 1.7 terabytes of usable space, giving this setup a total capacity of 8.5 terabytes.

Setting up git-annex to support this type of cold storage is fairly straightforward, but does necessitate some familiarity with how the program works. Personally, I prefer to do all my setup manually. I’m happy to let the assistant watch my repositories and manage them after the setup, and I’ll occasionally fire up the web app to see what the assistant daemon is doing, but I like the control and understanding provided by a manual setup. The power and flexibility of git-annex is deceptive. Using it solely through the simplified interface of the web app greatly limits what can be accomplished with it.

Encryption

Before even getting into git-annex, the drive should be encrypted with LUKS/dm-crypt. The need for this could be avoided by using something like gcrypt, but LUKS/dm-crypt is an ingrained habit and part of my workflow for all external drives. Assuming the drive is /dev/sdc, pass cryptsetup some sane defaults:

$ sudo cryptsetup --cipher aes-xts-plain64 --key-size 512 --hash sha512 luksFormat /dev/sdc

With the drive encrypted, it can then be opened and formatted. I’ll give the drive a human-friendly label of themisto.

$ sudo cryptsetup luksOpen /dev/sdc themisto_crypt
$ sudo mkfs.ext4 -L themisto /dev/mapper/themisto_crypt

At this point the drive is ready. I close it and then mount it with udiskie to make sure everything is working. How the drive is mounted doesn’t matter, but I like udiskie because it can integrate with my password manager to get the drive passphrase.

$ sudo cryptsetup luksClose /dev/mapper/themisto_crypt
$ udiskie-mount -r /dev/sdc

Git-Annex

With the encryption handled, the drive should now be mounted at /media/themisto. For the first few steps, we’ll basically follow the git-annex walkthrough. Let’s assume that we are setting up this drive to be a repository of the annex ~/video. The first step is to go to the drive, clone the repository, and initialize the annex. When initializing the annex I prepend the name of the remote with satellite :. My cold storage drives are all named after satellites, and doing this allows me to easily identify them when looking at a list of remotes.

$ cd /media/themisto
$ git clone ~/video
$ cd video
$ git annex init "satellite : themisto"

Disk Reserve

Whenever dealing with a repository that is bigger (or may become bigger) than the drive it is being stored on, it is important to set a disk reserve. This tells git-annex to always keep some free space around. I generally like to set this to 1 GB, which is way larger than it needs to be.

$ git config annex.diskreserve "1 gb"

Adding Remotes

I’ll then tell this new repository where the original repository is located. In this case I’ll refer to the original using the name of my computer, nous.

$ git remote add nous ~/video

If other remotes already exist, now is a good time to add them. These could be special remotes or normal ones. For this example, let’s say that we have already completed this whole process for another cold storage drive called sinope, and that we have an s3 remote creatively named s3.

$ git remote add sinope /media/sinope/video
$ export AWS_ACCESS_KEY_ID="..."
$ export AWS_SECRET_ACCESS_KEY="..."
$ git annex enableremote s3

Trust

Trust is a critical component of how git-annex works. Any new annex will default to being semi-trusted, which means that when running operations within the annex on the main computer – say, dropping a file – git-annex will want to confirm that themisto has the files that it is supposed to have. In the case of themisto being a USB drive that is rarely connected, this is not very useful. I tell git-annex to trust my cold storage drives, which means that if git-annex has a record of a certain file being on the drive, it will be satisfied with that. This increases the risk for potential data-loss, but for this application I feel it is appropriate.

$ git annex trust .

Preferred Content

The final step that needs to be taken on the new repository is to tell it what files it should want. This is done using preferred content. The standard groups that git-annex ships with cover most of the bases. Of interest for this application is the archive group, which wants all content except that which has already found its way to another archive. This is the behaviour I want, but I will duplicate it into a custom group called satellite. This keeps my cold storage drives as standalone things that do not influence any other remotes where I may want to use the default archive.

$ git annex groupwanted satellite "(not copies=satellite:1) or approxlackingcopies=1"
$ git annex group . satellite
$ git annex wanted . groupwanted

For other repositories, I may want to store the data on multiple cold storage drives. In that case I would create a redundantsatellite group that wants all content which is not already present in two other members of the group.

$ git annex groupwanted redundantsatellite "(not copies=redundantsatellite:2) or approxlackingcopies=1"
$ git annex group . redundantsatellite
$ git annex wanted . groupwanted

Syncing

With everything setup, the new repository is ready to sync and to start to ingest content from the remotes it knows about!

$ git annex sync --content

However, the original repository also needs to know about the new remote.

$ cd ~/video
$ git remote add themisto /media/themisto/video
$ git annex sync

The same is the case for any other previously existing repository, such as sinope.

Redundant File Storage

As I’ve mentioned previously, I store just about everything that matters in git-annex (the only exception is code, which is stored directly in regular git). One of git-annex’s many killer features is special remotes. They make tenable this whole “cloud storage” thing that we do now.

A special remote allows me to store my files with a large number of service providers. It makes this easy to do by abstracting away the particulars of the provider, allowing me to interact with all of them in the same way. It makes this safe to do by providing encryption. These factors encourage redundancy, reducing my reliance on any one provider.

Recently I began playing with rclone. Rclone is a program that supports file syncing for a handful of cloud storage providers. That’s semi-interesting by itself but, more significantly, there is a git-annex special remote wrapper. That means any of the providers supported by rclone can be used as a special remote. I looked through all of rclone’s supported providers and decided there were a few that I had no reason not to use.

Hubic

Hubic is a storage provider from OVH with a data center in France. Their pricing is attractive. I’d happily pay €50 per year for 10TB of storage. Unfortunately they limit connections to 10 Mbit/s. In my experience they ended up being even slower than this. Slow enough that I don’t want to give them money, but there’s still no reason not to take advantage of their free 25 GB plan.

After signing up, I setup a new remote in rclone.

$ rclone config
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> hubic-annex
Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox
   \ "dropbox"
 5 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 6 / Google Drive
   \ "drive"
 7 / Hubic
   \ "hubic"
 8 / Local Disk
   \ "local"
 9 / Microsoft OneDrive
   \ "onedrive"
10 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
11 / Yandex Disk
   \ "yandex"
Storage> 7
Hubic Client Id - leave blank normally.
client_id> 
Hubic Client Secret - leave blank normally.
client_secret> 
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
--------------------
[remote]
client_id = 
client_secret = 
token = {"access_token":"XXXXXX"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

With that setup, I went into my ~/documents annex and added the remote.

$ git annex initremote hubic type=external externaltype=rclone target=hubic-annex prefix=annex-documents chunk=50MiB encryption=shared rclone_layout=lower mac=HMACSHA512

I want git-annex to automatically send everything to Hubic, so I took advantage of standard groups and put the repository in the backup group.

$ git annex wanted hubic standard
$ git annex group hubic backup

Given Hubic’s slow speed, I don’t really want to download files from it unless I need to. This can be configured in git-annex by setting the cost of the remote. Local repositories default to 100 and remote repositories default to 200. I gave the Hubic remote a high cost so that it will only be used if no other remotes are available.

$ git config remote.hubic.annex-cost 500

If you would like to try Hubic, I have a referral code which gives us both an extra 5GB for free.

Backblaze B2

B2 is the cloud storage offering from backup company Backblaze. I don’t know anything about them, but at $0.005 per GB I like their pricing. A quick search of reviews shows that the main complaint about the service is that they offer no geographic redundancy, which is entirely irrelevant to me since I build my own redundancy with my half-dozen or so remotes per repository.

Signing up with Backblaze took a bit longer. They wanted a phone number for 2-factor authentication, I wanted to give them a credit card so that I could use more than the 10GB they offer for free, and I had to generate an application key to use with rclone. After that, the rclone setup was simple.

$ rclone config
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> b2-annex
Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox
   \ "dropbox"
 5 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 6 / Google Drive
   \ "drive"
 7 / Hubic
   \ "hubic"
 8 / Local Disk
   \ "local"
 9 / Microsoft OneDrive
   \ "onedrive"
10 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
11 / Yandex Disk
   \ "yandex"
Storage> 3
Account ID
account> 123456789abc
Application Key
key> 0123456789abcdef0123456789abcdef0123456789
Endpoint for the service - leave blank normally.
endpoint> 
Remote config
--------------------
[remote]
account = 123456789abc
key = 0123456789abcdef0123456789abcdef0123456789
endpoint = 
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

With that, it was back to ~/documents to initialize the remote and send it all the things

$ git annex initremote b2 type=external externaltype=rclone target=b2-annex prefix=annex-documents chunk=50MiB encryption=shared rclone_layout=lower mac=HMACSHA512
$ git annex wanted b2 standard
$ git annex group b2 backup

While I did not measure the speed with B2, it feels as fast as my S3 or rsync.net remotes, so I didn’t bother setting the cost.

Google Drive

While I do not regularly use Google services for personal things, I do have a Google account for Android stuff. Google Drive offers 15 GB of storage for free and rclone supports it, so why not take advantage?

$ rclone config
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> gdrive-annex
Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox
   \ "dropbox"
 5 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 6 / Google Drive
   \ "drive"
 7 / Hubic
   \ "hubic"
 8 / Local Disk
   \ "local"
 9 / Microsoft OneDrive
   \ "onedrive"
10 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
11 / Yandex Disk
   \ "yandex"
Storage> 6
Google Application Client Id - leave blank normally.
client_id> 
Google Application Client Secret - leave blank normally.
client_secret> 
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
--------------------
[remote]
client_id = 
client_secret = 
token = {"AccessToken":"xxxx.x.xxxxx_xxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","RefreshToken":"1/xxxxxxxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxx","Expiry":"2014-03-16T13:57:58.955387075Z","Extra":null}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

And again, to ~/documents.

$ git annex initremote gdrive type=external externaltype=rclone target=gdrive-annex prefix=annex-documents chunk=50MiB encryption=shared rclone_layout=lower mac=HMACSHA512
$ git annex wanted gdrive standard
$ git annex group gdrive backup

Rinse and repeat the process for other annexes. Revel in having simple, secure, and redundant storage.

I treat myself to a new laptop every three or four years.

A few weeks ago I bought a Lenovo Thinkpad X260, replacing the T430s that has been my daily driver since 2012. I’m a big fan of the simplicity, ruggedness and modularity of Thinkpads. It used to be that one of the only downsides to Thinkpads were the terrible screens, but that has been addressed by the X260’s FHD display. The high resolution let me move from the 14” display of the T430s to the 12.5” display of the X260 without feeling like I’ve lost anything, but with an obvious gain in portability. The X260 is a great machine to put Linux on, which Spark helps me to do with no effort and a minimum expenditure of time.

Thinkpad X260

I celebrated World Backup Day by increasing the resiliency of data in my life.

Four encrypted 2TB hard drives, stored in a Pelican 1200, with Abloy Protec2 PL 321 padlocks as tamper-evident seals. Having everything that matters stored in git-annex makes projects like this simple: just clone the repositories, define the preferred content expressions, and watch the magic happen.

Cold Storage

Isolating Chrome Apps with Firejail

Despite its terse man page, Chromium provides a large number of command-line options. One of these is app-id, which tells Chromium to directly launch a specific Chrome App. Combined with the isolation provided by Firejail, this makes using Chrome Apps a much more enjoyable experience.

For instance, I use the Signal Desktop app. When I received the beta invite, I created a new directory to act as the home directory for the sandbox that would run the app.

$ mkdir -p ~/.chromium-apps/signal

I then launched a sandboxed browser using that directory and installed the app.

$ firejail --private=~/.chromium-apps/signal /usr/bin/chromium

After the app was installed, I added an alias to my zsh configuration to launch the app directly.

alias signal="firejail --private=~/.chromium-apps/signal /usr/bin/chromium --app-id=bikioccmkafdpakkkcpdbppfkghcmihk"

To launch the application I can now simply run signal, just as if it was a normal desktop application. I don’t have to worry about it accessing private information, or even care that it is actually running on Chromium underneath. I use this method daily for a number of different Chrome Apps, all in different isolated directories in ~/.chromium-apps. As someone who is not a normal Chromium user, it makes the prospect of running a Chrome App much more attractive.

Firewarden

I’ve previously mentioned the Firejail sandbox program. It’s an incredibly useful tool. I use it to jail pretty much all the things. Over the past six months, I’ve found that one of my primary use cases for Firejail is to create private, temporary sandboxes which can be destroyed upon closure. I wrote Firewarden, a simple wrapper script around Firejail, to reduce the keystrokes needed for this type of use.

Disposable Browsers

Prepend any program with firewarden and it will launch the program inside a private Firejail sandbox. I use Firewarden to launch disposable Chromium instances dozens of times per day. When the program passed to Firewarden is chromium or google-chrome, Firewarden will add the appropriate options to the browser to prevent the first run greeting, disable the default browser check, and prevent the WebRTC IP leak. The following two commands are equivalent:

$ firejail --private chromium --no-first-run --no-default-browser-check --enforce-webrtc-ip-permission-check
$ firewarden chromium

Firewarden also provides a few options to request a more restricted Firejail sandbox. For instance, you may want to open a URL in Chromium, but also use an isolated network namespace and create a new /dev directory (which has the effect of disabling access to webcams, speakers and microphones). The following two commands are equivalent:

$ firejail --private --net=enp0s25 --netfilter --private-dev chromium --no-first-run --no-default-browser-check --enforce-webrtc-ip-permission-check https://example.org
$ firewarden -d -i chromium https://example.org

In this example, Firewarden used NetworkManager to discover that enp0s25 was the first connected device, so it used that for the network namespace.

Local Files

Firewarden isn’t just useful for browsers. It can be used with any program, but my other major use case is safely viewing local files. File types like PDF and JPG can include malicious code and are a primary vector for malware. I use zathura as my PDF reader, which is a simple and lightweight viewer that doesn’t include anywhere near the number of potential vulnerabilities as something like Adobe Acrobat, but I still think it prudent to take extra precautions when viewing PDF files downloaded from the internet.

If Firewarden thinks the final argument is a local file, it will create a new directory in /tmp, copy the file into it, and launch the program in a sandbox using the new temporary directory as the user home directory1. Firewarden will also default to creating a new /dev directory when viewing local files, as well as disabling network access (thus preventing a malicious file from phoning home). When the program has closed, Firewarden removes the temporary directory and its contents

$ firewarden zathura notatrap.pdf

The above command is the equivalent of:

$ export now=`date --iso-8601=s`
$ mkdir -p /tmp/$USER/firewarden/$now
$ cp notatrap.pdf /tmp/$USER/firewarden/$now/
$ firejail --net=none --private-dev --private=/tmp/$USER/firewarden/$now zathura notatrap.pdf
$ rm -r /tmp/$USER/firewarden/$now

I use this functionality numerous times throughout the day. I also include Firewarden in my mailcap, which goes a long way to reducing the dangers of email attachments.

Firewarden doesn’t add any new functionality to Firejail, but it does make it easier to take advantage of some of the great features that Firejail provides. Check it out if you’re interested in reducing the keystrokes required to Jail All The Things™.

Notes

  1. This is similar to using Firejail's old --private-home option, which was removed in 0.9.38. However, that option was limited to files in the user's home directory. It couldn't be easily used with a file from a USB drive mounted at /media/usb, for instance.

Using Network Trust

Work continues on Spark, my Arch Linux provisioning system. As the project has progressed, it has created some useful tools that I’ve spun off into their own projects. One of those is nmtrust.

The idea is simple. As laptop users, we frequently connect our machines to a variety of networks. Some of those networks we trust, others we don’t. I trust my home and work networks because I administer both of them. I don’t trust networks at cafes, hotels or airports, but sometimes I still want to use them. There are certain services I want to run when connected to trusted networks: mail syncing, file syncing, online backups, instant messaging and the like. I don’t want to run these on untrusted networks, either out of concern over the potential leak of private information or simply to keep my network footprint small.

The solution is equally simple. I use NetworkManager to manage networks. NetworkManager creates a profile for every network connection. Every profile is assigned a UUID. I can decide which networks I want to trust, lookup their UUID with nmcli conn, and put those strings into a file somewhere. I keep them in /usr/local/etc/trusted_networks.

nmtrust is a small shell script which gets the UUIDs of all the active connections from NetworkManager and compares them to those in the trusted network file. It returns a different exit code depending on what it finds: 0 if all connections are trusted, 3 if one or more connections are untrusted, and 4 if there are no active connections.

This makes it extremely easy to write a script that executes nmtrust and takes certain action based on the exit code. For example, you may have a network backup script netbackup.sh that is executed every hour by cron. However, you only want the script to run when you are connected to a network that you trust.

1
2
3
4
5
6
7
8
9
#!/bin/sh

# Execute nmtrust
nmtrust

# Execute backups if the current connection(s) are trusted.
if [ $? -eq 0 ]; then
    netbackup.sh
fi

On machines running systemd, most of the things that you want to start and stop based on the network are probably described by units. ttoggle is another small shell script which uses nmtrust to start and stop these units. The units that should only be run on trusted networks are placed into another file. I keep them in /usr/local/etc/trusted_units. ttoggle executes nmtrust and starts or stops everything in the trusted unit file based on the result.

For example, I have a timer mailsync.timer that periodically sends and receives my mail. I only want to run this on trusted networks, so I place it in the trusted unit file. If ttoggle is executed when I’m connected to a trusted network, it will start the timer. If it is run when I’m on an untrusted network or offline, it will stop the timer, ensuring my machine makes no connection to my IMAP or SMTP servers.

These scripts are easy to use, but they really should be automated so that nobody has to think about them. Fortunately, NetworkManager provides a dispatcher framework that we can hook into. When installed, the dispatcher will execute ttoggle whenever a connection is activated or deactivated.

The result of all of this is that trusted units are automatically started whenever all active network connections are trusted. Any other time, the trusted units are stopped. I can connect to shady public wifi without worrying about network services that may compromise my privacy running in the background. I can connect to my normal networks without needing to remember to start mail syncing, backups, etc.

All of this is baked in to Spark, but it’s really just two short shell scripts and a NetworkManager dispatcher. It provides a flexible framework to help preserve privacy that is fairly easy to use. If you use NetworkManager, try it out.

Spark: Arch Linux Provisioning with Ansible

Arch has been my Linux distribution of choice for the past 5 years or so. It’s a fairly simple and versatile distribution that leaves most choices up the user, and then gets out of your way. Although I think it makes for a better end experience, the Arch Way does mean that it takes a bit more time to get a working desktop environment up and running.

At work I use Ansible to automate the provisioning of FreeBSD servers. It makes life easier by not only automating the provisioning of machines, but also by serving as reference documentation for The One True Way™. After a short time using Ansible to build servers, the idea of creating an Ansible playbook to provision my Arch desktop became attractive: I could pop a new drive into a machine, perform a basic Arch install, run the Ansible playbook and, in a very short period of time, have a fresh working environment – all without needing to worry about recalling arcane system configuration or which obscure packages I want installed. I found a few projects out there that had this same goal, but none that did things in the way I wanted them done. So I built my own.

Spark is an Ansible playbook meant to provision a personal machine running Arch Linux. It is intended to run locally on a fresh Arch install (ie, taking the place of any post-installation), but due to Ansible’s idempotent nature it may also be run on top of an already configured machine.

My machine is a Thinkpad, so Spark includes some tasks which are specific to laptops in general and others which only apply to Thinkpads. These tasks are tagged and isolated into their own roles, making it easy to use Spark to build desktops on other hardware. A community-contributed Macbook role exists to support Apple hardware. In fact, everything is tagged, and most of the user-specific stuff is accomplished with variables. The idea being that if you agree with my basic assumptions about what a desktop environment should be, you can use Spark to build your machine without editing much outside of the variables and perhaps the playbook.

The roles gather tasks into logical groups, and the tasks themselves are fairly simple. A quick skim through the repository will provide an understanding of everything Spark will do a matter of minutes. Basically: a simple i3 desktop environment, with GUI programs limited to web browsers and a few media and office applications (like GIMP and LibreOffice), everything else in the terminal, most network applications jailed with Firejail, and all the annoying laptop tasks like lid closure events and battery management automated away. If you’re familiar with my dotfiles, there won’t be any surprises.

Included in Spark is a file which describes how I install Arch. It is extremely brief, but provides everything needed to perform a basic installation – including full disk encryption with encrypted /boot – which can then be filled out with Ansible. I literally copy/paste from the doc when installing Arch. It takes about 15 minutes to complete the installation. Running Ansible after that takes about an hour, but requires no interaction after entering a passphrase for the SSH key used to clone the dotfiles. Combined with backups of the data in my home dir, this allows me to go from zero to hero in less than a couple hours without needing to really think about it.

If you use Arch, fork the repository and try it out.