You are currently viewing all posts tagged with linux.

Make VMs Great Again

I have trust issues.

When Claude Code was released last year I was interested in playing with it, but struggled to find a way to run it in a secure, isolated manner. Trusting its own sandbox was obviously not in the cards. I explored other people’s solutions – mostly using things like Bubblewrap and Docker – but none of them fully satisfied me. It also quickly became evident that most of the value from using Claude Code comes from the --dangerously-skip-permissions argument, which gives it the ability to pursue a targeted task without constant permission requests. This requires even stricter isolation.

Eventually I landed on a full virtual machine being the only viable option. I reinstalled Vagrant for the first time in almost a decade and was off to the races.

I was a heavy user of Vagrant in the twenty-teens. I still really like the interface. But when revisiting it today the software felt rather heavy and clunky. It defaults to using VirtualBox, which has its own set of issues. There is a community provided libvirt plugin, but that seems largely abandoned. “Abandoned” seems to be the word for most of the Vagrant community – probably caused in part by the license change.

So for the past month I’ve been building Migrant, a lightweight VM management tool for running assumed-malicious AI agents in ephemeral environments. The heavy lifting is done by libvirt and QEMU. Migrant started out as just a way to get a more Vagrant-like interface with modern tooling. I use cloud-init to initialize the image, Ansible to configure it, and libvirt for the VM management. But because the whole raison d’ĂȘtre of the project is the fact that non-deterministic systems are inherently untrustworthy, Migrant expanded to have a suite of security features. It has network isolation, so the agent can’t compromise the rest of your LAN. It has shared folder isolation, so that the agent can’t exhaust the host disk or engage in any symlink traversal shenanigans. It has WireGuard tunnel support, implemented host-side such that the VM cannot bypass it (because why wouldn’t you want to run all your agents through Mullvad).

I think it’s pretty great. I use it regularly.

Migrant also serves as my testament as to how agentic coding should work. I’ve written it using Claude Code (initially running in a Vagrant-managed VM, but since the first public commit I’ve been building Migrant-in-Migrant), but it is the antithesis of “vibe coding”. I design the systems. I tell the agent how things should work. I review every line of code it produces. Most of the time I reject its first attempts. I take ownership of and responsibility for commits. The result, I think, is a pretty reasonable looking codebase.

My conclusion thus far is that coding agents are useful tools. They’re an accelerant. They’re great for exploring a problem space. There’s no going back to software development without them, but if they’re not being actively driven by an opinionated human with domain-knowledge and expertise, what they produce is mostly crap. Maybe that will change the future. For now, if you’re not challenging every line of output from the clankers, you’re doing it wrong. I suspect this applies equally to the application of LLMs in other areas, but personally I haven’t found LLMs to be useful for anything other than writing code.

Cloning Backup Drives

Continuing with the theme of replacing drives, recently I decided to preemptively replace one of the external drives that I backup to via rsnapshot – or, more specifically, via cryptshot. The drive was functioning nominally, but its date of manufacture was 2014. That’s way too long to trust spinning rust.

rsnapshot implements deduplication via hard links. Were I to just rsync the contents of the old drive to the new drive without any special consideration for the links, it would dereference the links, copying them over as separate files. This would cause the size of the backups to balloon past the capacity of the drive. Rsync provides the --hard-links flag to address this, but I’ve heard some stories about this failing to act as expected when the source directory has a large number of hard links (for some unknown definition of “large”). I’ve been rsnapshotting since 2012 (after a pause sometime after 2006, apparently) and feel safe assuming that my rsnapshot repository does have a “large” number of hard links.

I also do not really care about syncing. The destination is completely empty. There’s no file comparison that needs to happen. I don’t need to the ability to pause partway through the transfer and resume later. Rsync is my default solution for pushing files around, but in this case it is not really needed. I only want to mirror the contents of the old drive onto the new drive, exactly as they exist on the old drive. So I avoided the problem all together and just copied the partition via dd.

Both drives are encrypted with LUKS, so first I decrypt them. Importantly, I do not mount either decrypted partition. I don’t want to risk any modifications being made to either while the copy is ongoing.

$ sudo cryptsetup luksOpen /dev/sda old
$ sudo cryptsetup luksOpen /dev/sdb new

Then I copy the old partition to the new one.

$ sudo dd if=/dev/mapper/old of=/dev/mapper/new bs=32M status=progress

My new drive is the same size as my old drive, so after dd finished I was done. If the sizes differed I would need to use resize2fs to resize the partition on the new drive.

If I was replacing the old drive not just because it was old and I was ageist, but because I thought it may be corrupted, I would probably do this with GNU ddrescue rather than plain old dd. (Though, realistically, if that was the case I’d probably just copy the contents of my other rsnapshot target drive to the new drive, and replace the corrupt drive with that. Multiple backup mediums make life easier.)

Git Annex Recovery

Occasionally I’ll come across some sort of corruption on one of my cold storage drives. This can typically repaired in-place via git-annex-repair, but I usually take it as a sign that the hard drive itself is beginning to fail. I prefer to replace the drive. At the end of the process, I want the new drive to be mounted at the same location as the old one was, and I want the repository on the new drive to have the same UUID as the old one. This way the migration is invisible to all other copies of the repository.

To do this, I first prepare the new drive using whatever sort of LUKS encryption and formatting I want, and then mount it at the same location as wherever the old drive was normally mounted to. Call this path $good. The old drive I’ll mount to some other location. Call this path $bad.

Next I create a new clone of the repository on the new drive. Most recently I did this for my video repo, which lives at ~/library/video.

$ git clone ~/library/video $good/video

The .git/config file from the old drive will have the UUID of the annex and other configuration options, as well as any knowledge about other remotes. I copy that into the new repo.

$ cp $bad/video/.git/config $good/video/.git/config

The actual file contents are stored in the .git/annex/objects/ directory. I copy those over to the new drive.

$ mkdir $good/video/.git/annex
$ rsync -avhP --no-compress --info=progress2 $bad/video/.git/annex/objects $good/video/.git/annex/

Next I initialize the new annex. It will recognize the old config and existing objects that were copied over.

$ cd $good/video
$ git annex init

At this point I could be done. But if I suspect that there was corruption in one of the files in the .git/annex/objects directory that I copied over, I will next tell the annex to run a check on all its files. I’ll usually start this with --incremental in case I want to kill it before it completes and resume it later. I’ll provide some integer to --jobs depending on how many cores I want to devote to hashing and what I think is appropriate for the disk read and transfer speeds.

$ git annex fsck --incremental --jobs=N

If any of the files did fail, I’ll make sure one of the other remotes is available and then tell the new annex to get whatever it wants.

$ git annex get --auto

Finally, I would want to get rid of any of those corrupt objects that are now just wasting space.

$ git annex unused
$ git annex dropunused all

Optimizing Local Munitions

As previously mentioned, I use myrepos to keep local copies of useful code repositories. While working with backups yesterday I noticed that this directory had gotten quite large. I realized that in the 8 years that I’ve been using this system, I’ve never once run git gc in any of the repos.

Fortunately this is the sort of thing that myrepos makes simple – even providing it as an example on its homepage. I added two new lines to the [DEFAULT] section of my ~/library/src/myrepos.conf file: one telling it that it can run 3 parallel jobs, and one teaching it how to run git gc.

[DEFAULT]
skip = [ "$1" = update ] && ! hours_since "$1" 24
jobs = 3
git_gc = git gc "$@"

That allowed me to use my existing lmr alias to clean up all the git repositories. The software knows which repositories are git, and only attempts to run the command in those.

$ lmr gc

After completing this process – which burned through a lot of CPU – my ~/library/src directory dropped from 70 GB to 15 GB.

So that helped.

Wherein the Author Learns to Compact Borg Archives

I noticed that my Borg directory on The Cloud was 239 GB. This struck me as problematic, as I could see in my local logs that Borg itself reported the deduplicated size of all archives to be 86 GB.

A web search revealed borg compact, which apparently I have been meant to run manually since 2019. Oops. After compacting, the directory dropped from 239 GB to 81 GB.

My borg wrapper script now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/sh
source ~/.keys/borg.sh
export BORG_REPO='borg-rsync:borg/nous'
export BORG_REMOTE_PATH='borg1'

# Create backups
echo "Creating backups..."
borg create --verbose --stats --compression=lz4             \
    --exclude ~/projects/foo/bar/baz                        \
    --exclude ~/projects/xyz/bigfatbinaries                 \
    ::'{hostname}-{user}-{utcnow:%Y-%m-%dT%H:%M:%S}'        \
    ~/documents                                             \
    ~/projects                                              \
    ~/mail                                                  \
    # ...etc

# Prune
echo "Pruning backups..."
borg prune --verbose --list --glob-archives '{hostname}-{user}-*'   \
    --keep-within=1d                                                \
    --keep-daily=14                                                 \
    --keep-weekly=8                                                 \
    --keep-monthly=12                                               \

# Compact
echo "Compacting repository..."
backitup                                \
    -p 604800                           \
    -l ~/.borg_compact-repo.lastrun     \
    -b "borg compact --verbose"         \

# Check
echo "Checking repository..."
backitup -a                                                         \
    -p 172800                                                       \
    -l ~/.borg_check-repo.lastrun                                   \
    -b "borg check --verbose --repository-only --max-duration=1200" \

echo "Checking archives..."
backitup -a                                             \
    -p 259200                                           \
    -l ~/.borg_check-arch.lastrun                       \
    -b "borg check --verbose --archives-only --last 18" \

Other than the addition of a weekly compact, my setup is the same as it ever was.

Working with ACSM Files on Linux

I acquire books from various OverDrive instances. OverDrive provides an ACSM file, which is not a book, but instead an XML ticket meant to be exchanged for the actual book file – similar to requesting a book in meatspace by turning in a catalog card to a librarian. Adobe Digital Editions is used to perform this exchange. As one would expect from Adobe, this software does not support Linux.

Back in 2013 I setup a Windows 7 virtual machine with Adobe Digital Editions v2.0.1.78765, which I used exclusively for turning ACSM files into EPUB files. A few months ago I was finally able to retire that VM thanks to the discovery of libgourou, which is both a library and a suite of utilities that can be used to work with ACSM files.

To use, I first register an anonymous account with Adobe.

$ adept_activate -a

Next I export the private key that the files will be encrypted to.

$ acsmdownloader --export-private-key

This key can then be imported into the DeDRM_tools plugin of Calibre.

Whenever I receive an ACSM file, I can just pass it to the acsmdownloader utility from libgourou.

$ acsmdownloader -f foobar.acsm

This spits out the EPUB, which may be imported into my standard Calibre library.

The Things I Do for Time

I am a believer in the sacred word as defined in ISO 8601, and the later revelations such as RFC 3339. Numerical dates should be formatted as YYYY-MM-DD. Hours should be written in 24-hour time. I will die on this hill.

Since time immemorial, this has been accomplished on Linux systems by setting LC_TIME to the en_DK locale. More specifically, the git history for glibc shows that en_DK was added (with ISO 8601 date formatting) by Ulrich Drepper on 1997-03-05.

A few years ago, this stopped working in Firefox. Instead Firefox started to think that numerical dates were supposed to be formatted as DD/MM/YYYY, which is at least as asinine as the typical American MM-DD-YYYY format. I finally got fed up with this and decided to investigate.

The best discussion of the issue is in Thunderbird bug 1426907. Here I learned that the problem is caused by Thunderbird (and by extension Firefox) no longer respecting glibc locales. Mozilla software simply takes the name of the system locale, ignores its definition, and looks up formatting in the Unicode CLDR. The CLDR has redefined en_DK to use DD/MM/YYYY1.

The hack to address the problem was also documented in the Thunderbird bug report. The CLDR includes a definition for en_SE which uses YYYY-MM-DD2 and 24-hour time. (It also separates the time from the date with a comma, which is weird, but Sweden is weird, so I’ll allow it.) There is no en_SE locale in glibc. But it can be created by linking to the en_DK locale. This new locale can then be used for LC_TIME.

$ sudo ln -s /usr/share/i18n/locales/en_DK /usr/share/i18n/locales/en_SE
$ echo 'en_SE.UTF-8 UTF-8' | sudo tee -a /etc/locale.gen
$ sudo locale-gen
$ sed -i 's/^LC_TIME=.*/LC_TIME=en_SE.UTF-8/' /etc/locale.conf

Now anything that respects glibc locales will effectively use en_DK, albeit under a different name. Anything that uses CLDR will just see that it is supposed to use a locale named en_SE, which still results in sane formatting. Thus one can use HTML date input fields without going crazy.

Notes

  1. The Unicode specification defines this pattern as "dd/MM/y", which is rather unintuitive, but worth including here for search engines.
  2. The Unicode specification defines this pattern as "y-MM-DD".

Redswitch

Redshift is a program that adjusts the color temperature of the screen based on time and location. It can automatically fetch one’s location via GeoClue. I’ve used it for years. It works most of the time. But, more often than I’d like, it fails to fetch my location from GeoClue. When this happens, I find GeoClue impossible to debug. Redshift does not cache location information, so when it fails to fetch my location the result is an eye-meltingly bright screen at night. To address this, I wrote a small shell script to avoid GeoClue entirely.

Redswitch fetches the current location via the Mozilla Location Service (using GeoClue’s API key, which may go away). The result is stored and compared against the previous location to determine if the device has moved. If a change in location is detected, Redshift is killed and relaunched with the new location (this will result in a noticeable flash, but there seems to be no alternative since Redshift cannot reload its settings while running). If Redshift is not running, it is launched. If no change in location is detected and Redshift is already running, nothing happens. Because the location information is stored, this can safely be used to launch Redshift when the machine is offline (or when the Mozilla Location Service API is down or rate-limited).

My laptop does not experience frequent, drastic changes in location. I find that having the script automatically execute once upon login is adequate for my needs. If you’re jetting around the world, you could periodically execute the script via cron or a systemd timer.

This solves all my problems with Redshift. I can go back to forgetting about its existence, which is my goal for software of this sort.