You are currently viewing all posts tagged with linux.

Optimizing Local Munitions

As previously mentioned, I use myrepos to keep local copies of useful code repositories. While working with backups yesterday I noticed that this directory had gotten quite large. I realized that in the 8 years that I’ve been using this system, I’ve never once run git gc in any of the repos.

Fortunately this is the sort of thing that myrepos makes simple – even providing it as an example on its homepage. I added two new lines to the [DEFAULT] section of my ~/library/src/myrepos.conf file: one telling it that it can run 3 parallel jobs, and one teaching it how to run git gc.

[DEFAULT]
skip = [ "$1" = update ] && ! hours_since "$1" 24
jobs = 3
git_gc = git gc "$@"

That allowed me to use my existing lmr alias to clean up all the git repositories. The software knows which repositories are git, and only attempts to run the command in those.

$ lmr gc

After completing this process – which burned through a lot of CPU – my ~/library/src directory dropped from 70 GB to 15 GB.

So that helped.

Wherein the Author Learns to Compact Borg Archives

I noticed that my Borg directory on The Cloud was 239 GB. This struck me as problematic, as I could see in my local logs that Borg itself reported the deduplicated size of all archives to be 86 GB.

A web search revealed borg compact, which apparently I have been meant to run manually since 2019. Oops. After compacting, the directory dropped from 239 GB to 81 GB.

My borg wrapper script now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/sh
source ~/.keys/borg.sh
export BORG_REPO='borg-rsync:borg/nous'
export BORG_REMOTE_PATH='borg1'

# Create backups
echo "Creating backups..."
borg create --verbose --stats --compression=lz4             \
    --exclude ~/projects/foo/bar/baz                        \
    --exclude ~/projects/xyz/bigfatbinaries                 \
    ::'{hostname}-{user}-{utcnow:%Y-%m-%dT%H:%M:%S}'        \
    ~/documents                                             \
    ~/projects                                              \
    ~/mail                                                  \
    # ...etc

# Prune
echo "Pruning backups..."
borg prune --verbose --list --glob-archives '{hostname}-{user}-*'   \
    --keep-within=1d                                                \
    --keep-daily=14                                                 \
    --keep-weekly=8                                                 \
    --keep-monthly=12                                               \

# Compact
echo "Compacting repository..."
backitup                                \
    -p 604800                           \
    -l ~/.borg_compact-repo.lastrun     \
    -b "borg compact --verbose"         \

# Check
echo "Checking repository..."
backitup -a                                                         \
    -p 172800                                                       \
    -l ~/.borg_check-repo.lastrun                                   \
    -b "borg check --verbose --repository-only --max-duration=1200" \

echo "Checking archives..."
backitup -a                                             \
    -p 259200                                           \
    -l ~/.borg_check-arch.lastrun                       \
    -b "borg check --verbose --archives-only --last 18" \

Other than the addition of a weekly compact, my setup is the same as it ever was.

Working with ACSM Files on Linux

I acquire books from various OverDrive instances. OverDrive provides an ACSM file, which is not a book, but instead an XML ticket meant to be exchanged for the actual book file – similar to requesting a book in meatspace by turning in a catalog card to a librarian. Adobe Digital Editions is used to perform this exchange. As one would expect from Adobe, this software does not support Linux.

Back in 2013 I setup a Windows 7 virtual machine with Adobe Digital Editions v2.0.1.78765, which I used exclusively for turning ACSM files into EPUB files. A few months ago I was finally able to retire that VM thanks to the discovery of libgourou, which is both a library and a suite of utilities that can be used to work with ACSM files.

To use, I first register an anonymous account with Adobe.

$ adept_activate -a

Next I export the private key that the files will be encrypted to.

$ acsmdownloader --export-private-key

This key can then be imported into the DeDRM_tools plugin of Calibre.

Whenever I receive an ACSM file, I can just pass it to the acsmdownloader utility from libgourou.

$ acsmdownloader -f foobar.acsm

This spits out the EPUB, which may be imported into my standard Calibre library.

The Things I Do for Time

I am a believer in the sacred word as defined in ISO 8601, and the later revelations such as RFC 3339. Numerical dates should be formatted as YYYY-MM-DD. Hours should be written in 24-hour time. I will die on this hill.

Since time immemorial, this has been accomplished on Linux systems by setting LC_TIME to the en_DK locale. More specifically, the git history for glibc shows that en_DK was added (with ISO 8601 date formatting) by Ulrich Drepper on 1997-03-05.

A few years ago, this stopped working in Firefox. Instead Firefox started to think that numerical dates were supposed to be formatted as DD/MM/YYYY, which is at least as asinine as the typical American MM-DD-YYYY format. I finally got fed up with this and decided to investigate.

The best discussion of the issue is in Thunderbird bug 1426907. Here I learned that the problem is caused by Thunderbird (and by extension Firefox) no longer respecting glibc locales. Mozilla software simply takes the name of the system locale, ignores its definition, and looks up formatting in the Unicode CLDR. The CLDR has redefined en_DK to use DD/MM/YYYY1.

The hack to address the problem was also documented in the Thunderbird bug report. The CLDR includes a definition for en_SE which uses YYYY-MM-DD2 and 24-hour time. (It also separates the time from the date with a comma, which is weird, but Sweden is weird, so I’ll allow it.) There is no en_SE locale in glibc. But it can be created by linking to the en_DK locale. This new locale can then be used for LC_TIME.

$ sudo ln -s /usr/share/i18n/locales/en_DK /usr/share/i18n/locales/en_SE
$ echo 'en_SE.UTF-8 UTF-8' | sudo tee -a /etc/locale.gen
$ sudo locale-gen
$ sed -i 's/^LC_TIME=.*/LC_TIME=en_SE.UTF-8/' /etc/locale.conf

Now anything that respects glibc locales will effectively use en_DK, albeit under a different name. Anything that uses CLDR will just see that it is supposed to use a locale named en_SE, which still results in sane formatting. Thus one can use HTML date input fields without going crazy.

Notes

  1. The Unicode specification defines this pattern as "dd/MM/y", which is rather unintuitive, but worth including here for search engines.
  2. The Unicode specification defines this pattern as "y-MM-DD".

Redswitch

Redshift is a program that adjusts the color temperature of the screen based on time and location. It can automatically fetch one’s location via GeoClue. I’ve used it for years. It works most of the time. But, more often than I’d like, it fails to fetch my location from GeoClue. When this happens, I find GeoClue impossible to debug. Redshift does not cache location information, so when it fails to fetch my location the result is an eye-meltingly bright screen at night. To address this, I wrote a small shell script to avoid GeoClue entirely.

Redswitch fetches the current location via the Mozilla Location Service (using GeoClue’s API key, which may go away). The result is stored and compared against the previous location to determine if the device has moved. If a change in location is detected, Redshift is killed and relaunched with the new location (this will result in a noticeable flash, but there seems to be no alternative since Redshift cannot reload its settings while running). If Redshift is not running, it is launched. If no change in location is detected and Redshift is already running, nothing happens. Because the location information is stored, this can safely be used to launch Redshift when the machine is offline (or when the Mozilla Location Service API is down or rate-limited).

My laptop does not experience frequent, drastic changes in location. I find that having the script automatically execute once upon login is adequate for my needs. If you’re jetting around the world, you could periodically execute the script via cron or a systemd timer.

This solves all my problems with Redshift. I can go back to forgetting about its existence, which is my goal for software of this sort.

Searching Books

ripgrep-all is a small wrapper around ripgrep that adds support for additional file formats.

I discovered it while looking for a program that would allow me to search my e-book library without needing to open individual books and search their contents via Calibre. ripgrep-all accomplishes this by using Pandoc to convert files to plain text and then running ripgrep on the output. One of the numerous formats supported by Pandoc is EPUB, which is the format I use to store books.

Running Pandoc on every book in my library to extract its text can take some time, but ripgrep-all caches the extracted text so that subsequent runs are similar in speed to simply searching plain text – which is blazing fast thanks to ripgrep’s speed. It takes around two seconds to search 1,706 books.

$ time(rga -li 'pandemic' ~/library/books/ | wc -l)
33

real    0m1.225s
user    0m2.458s
sys     0m1.759s

I published my script for creating optical backups.

Optician archives a directory, optionally encrypts it, records the integrity of all the things, and burns it to disc. I created it last year after writing about the steps I took to create optical backups of financial archives. Since then I’ve used it to create my monthly password database backups, yearly e-book library backups, and this year’s annual financial backup.

Personal Information Management

pimutils is a collection of software for personal information management. The core piece is vdirsyncer, which synchronizes calendars and contacts between the local filesystem and CalDav and CardDAV servers. Calendars may then be interacted with via khal, and contacts via khard. There’s not much to say about these three programs, other than they all just work. Having offline access to my calendars and contacts is critical, as is the ability to synchronize that data across machines.

Khard integrates easily with mutt to provide autocomplete when composing emails. I find its interface for creating, editing and reading contacts to be intuitive. It can also output a calendar of birthdays, which can then be imported into khal.

Khal’s interface for adding new calendar events is much simpler and quicker than all the mousing required by GUI calendar programs.

$ khal new 2019-11-16 21:30 5h Alessandro Cortini at Public Works :: 161 Erie St

There are times when a more complex user interface makes calendaring tasks easier. For this Khal offers the interactive option, which provides a TUI for creating, editing and reading events.

Khal can also import iCalendar files, which is a simple way of getting existing events into my world.

$ khal import invite.ics

Vdirsyncer has maintenance problems that may call its future into question, but the whole point of modular tools that operate on open data formats is that they are replaceable.

I have a simple and often used script which calls khal calendar and task list (the latter command being taskwarrior), answering the question: what am I supposed to be doing right now?