You are currently viewing all posts tagged with toolchain.

2025-06-27

Calibre News

I follow local news – news from my city – daily via RSS. For anything wider in scope, I find that weekly is the correct cadence. Anything more frequent is generally a waste of time and not conducive to living life. I get my non-local news via Calibre News.

Calibre ships with a large number of recipes, which are Python modules that tell it how to download content from websites. (One can create their own recipes, but I have not bothered to do so.) When a recipe is run, Calibre fetches all content and creates a nicely formatted EPUB. Often the recipe is able to bypass paywalls, making this the best way to freely read online news.

The news functionality has a scheduler which can be used to fetch content from selected recipes in an automated and periodic fashion. It can take some experimentation to figure out what schedule makes sense for which source, as there is not any sort of duplication controls. If the source only posts updates weekly, but you have Calibre scheduled to run the recipe daily, you will end up with 7 identical EPUBs at the end of the week.

Recipes can also be executed via the command line by passing a recipe name and output filename to ebook-convert. This allows you to setup your own scheduler using cron or systemd timers.

$ ebook-convert "The Economist.recipe" .epub

Calibre includes configurable controls for how many issues of a news source you want to store. You can tell it to only keep up to 3 issues, or keep all issues up to 30 days old, for instance.

Once the EPUB is in the library, Calibre takes care of automatically pushing it to connected devices and deleting old files.

The author on these files is set to calibre, causing them to be stored within the library in a calibre/ directory. My library is stored as a git-annex, but unlike all the actual books in my library, I consider these downloads to be ephemeral. I do not want them tracked by git, or pushed to my special remotes. I achieve this by adding calibre/ to my .gitignore file.

Each of the files is tagged with news, so I can easily exclude them from my book searches, or filter the library for only them.

The two recipes I keep scheduled are those for The Economist and Foreign Affairs. The Economist is scheduled for every Friday. Foreign Affairs is scheduled for every 60 days. What this means in practice is that I open Calibre every Friday morning and plug in my eReader. Within a few minutes Calibre will download my weekly news from The Economist, and Foreign Affairs every other month, and sync them to the device. I read those EPUBs over the next week.

Previously I also scheduled downloads for The Diplomat, but I found that The Economist’s Asia coverage was adequate enough for my needs. I’ve also used Calibre to download The Atlantic and Harper’s, but these days I rarely find myself in the mood for long-form articles – I’d rather spend that time reading a book. Foreign Affairs is the exception here, but it is a worthy one. Between the one and the other I am mostly consuming facts, which I gather is not the case for many people.

Excepting the city news in my RSS reader, my news consumption outside of these EPUBs is almost zero. This has been working well for me. I judge my success by the number of memes I do not understand.

2024-08-10

books, toolchain

In Which Graphic Novels Are Optimized for Portability

Kindle Comic Converter is a program that optimizes comic book files for e-readers. I have not read many graphic novels in the past, but I think that is likely to change now that I have found a good workflow for consuming them digitally. The portability of my Kobo Libra 2 makes it more convenient than reading on my laptop. Its 7” screen is large enough for me to enjoy comics when properly formatted, unlike the 6” screen of my old Kindle Paperwhite. (8” would probably be the best screen size for this type of content, but I am not sure that I would be pleased with the decrease in the packability of the device.)

I am using the command line version of KCC (surprising no one), typically as such:

$ kcc-c2e --profile KoL --upscale --cropping 2 --splitter 2 input-file.cbr

--profile KoL specifies that the target device is my Kobo Libra 2. The program will optimize the output file for the resolution and color profile of this device.
--upscale instructs the program to enhance images smaller than the device’s resolution.
--cropping 2 will attempt to crop out margins and page numbers.
--splitter 2 instructs the program to duplicate double page spreads. The spread will first be displayed as a single rotated page (so that I can see the whole image at once, as the illustrator intended), and then split into two pages (so that I can see details and read text without zooming). This sometimes makes poor decisions on filler pages – pages of credits, praise blurbs, etc – but it seems to always do the right thing when you’re in the pages of the comic itself.

I import the original source file into my Calibre library, and then add the KCC-generated EPUB as an additional file to the same book record. When loading the book onto my reader, I explicitly tell Calibre to send the EPUB. I do not allow Calibre to do any further conversion to this file.

The resulting files do not look great when viewed on my computer. The lack of margins from the --cropping 2 flag is annoying, and the images look dark and jagged. But on the E Ink screen they look great.

I used this process to read Craig Thompson’s Blankets, which I learned about thanks to Utah’s attempt to ban it. This book was fantastic. For maximum teenage angst, I recommend reading it while listening to The Cure. (I don’t even especially like The Cure, but when I finished Blankets I was struck with the strange desire to spend the next day repeatedly listening to the few albums of The Cure that I do own – so I did.) The minimalist, black-and-white art style of Blankets lends itself perfectly to a grayscale E Ink screen. I was impressed at how much emotion he can communicate with so few lines.

I have recently begun to read Monstress by Marjorie Liu and Sana Takeda. This one is weird. (Utahns are going to lose their shit when they learn about it.) Unlike Blankets, Monstress is drawn with lush, full-color artwork. It also has a lot more text, whereas Blankets was much more about emotion than exposition. I read the first issue on the Libra 2, and then borrowed the dead-tree version from the library and reread it to compare. I think it still looks great on the grayscale E Ink screen – it gives the impression of being extremely detailed graphite pencil work – but the color does add a little something extra (gore, mostly). The text is legible without zooming, but on the small side. I am jumping back and forth between reading further issues on the Libra 2 and on color paper. The portability of the Libra 2 counts for a lot – I carry it with me every day – but I think Monstress is probably better consumed on paper – or digitally on a larger color screen.

2024-07-03

books, toolchain, linux

Working with ACSM Files on Linux

I acquire books from various OverDrive instances. OverDrive provides an ACSM file, which is not a book, but instead an XML ticket meant to be exchanged for the actual book file – similar to requesting a book in meatspace by turning in a catalog card to a librarian. Adobe Digital Editions is used to perform this exchange. As one would expect from Adobe, this software does not support Linux.

Back in 2013 I setup a Windows 7 virtual machine with Adobe Digital Editions v2.0.1.78765, which I used exclusively for turning ACSM files into EPUB files. A few months ago I was finally able to retire that VM thanks to the discovery of libgourou, which is both a library and a suite of utilities that can be used to work with ACSM files.

To use, I first register an anonymous account with Adobe.

$ adept_activate -a

Next I export the private key that the files will be encrypted to.

$ acsmdownloader --export-private-key

This key can then be imported into the DeDRM_tools plugin of Calibre.

Whenever I receive an ACSM file, I can just pass it to the acsmdownloader utility from libgourou.

$ acsmdownloader -f foobar.acsm

This spits out the EPUB, which may be imported into my standard Calibre library.

2021-10-17

toolchain, linux, shell

Redswitch

Redshift is a program that adjusts the color temperature of the screen based on time and location. It can automatically fetch one’s location via GeoClue. I’ve used it for years. It works most of the time. But, more often than I’d like, it fails to fetch my location from GeoClue. When this happens, I find GeoClue impossible to debug. Redshift does not cache location information, so when it fails to fetch my location the result is an eye-meltingly bright screen at night. To address this, I wrote a small shell script to avoid GeoClue entirely.

Redswitch fetches the current location via the Mozilla Location Service (using GeoClue’s API key, which may go away). The result is stored and compared against the previous location to determine if the device has moved. If a change in location is detected, Redshift is killed and relaunched with the new location (this will result in a noticeable flash, but there seems to be no alternative since Redshift cannot reload its settings while running). If Redshift is not running, it is launched. If no change in location is detected and Redshift is already running, nothing happens. Because the location information is stored, this can safely be used to launch Redshift when the machine is offline (or when the Mozilla Location Service API is down or rate-limited).

My laptop does not experience frequent, drastic changes in location. I find that having the script automatically execute once upon login is adequate for my needs. If you’re jetting around the world, you could periodically execute the script via cron or a systemd timer.

This solves all my problems with Redshift. I can go back to forgetting about its existence, which is my goal for software of this sort.

2021-10-01

toolchain

Browser Extensions

I try to keep the number of browser extensions I use to a minimum. The following are what I find necessary in Firefox.

ClearURLS

ClearURLs removes extra cruft from URLs. I don’t really a problem with things like UTM parameters. Such things seem reasonable to me. But, more broadly, digital advertising has proved itself hostile to my interests, so I choose to be hostile right back.

Cookie AutoDelete

Cookie AutoDelete deletes cookies after a tab is closed or the domain changes. I whitelist cookies for some of the services I run, like my RSS reader, but every other cookie gets deleted 10 seconds after I leave the site. The extension can also manage other data stores, like IndexedDB and Local Storage.

Feed Preview

Feed Preview adds an icon to the address bar when a page includes an RSS or Atom feed in its header. This used to be built in to Firefox, but for some inexplicable reason they removed it some years ago now. Removing the icon broke one of the core ways that I use a web browser. As the name suggests, the extension can also render a preview of the feed. I don’t use it for that. I just want my icon back.

Firefox Multi-Account Containers

Firefox Multi-Account Containers is a Mozilla provided extension to create different containers and assign domains to them. In modern web browser parlance, a container means isolated storage. So a cookie in container A is not visible within container B, and vice versa.

Temporary Containers

Temporary Containers is the real workhorse of my containment strategy. It generates a new, temporary container for every domain. It automatically deletes the containers it generates 5 minutes after the last tab in that container is closed. This effectively isolates all domains from one another.

History Cleaner

History Cleaner deletes browser history that is older than 200 days. History is useful, but if I haven’t visited a URL in more than 200 days, I probably no longer care about. Having all that cruft automatically cleaned out makes it easier to find what I’m looking for in the remaining history, and speeds up autocomplete in the address bar.

Redirector

Redirector lets you create pattern-based URL redirects. I use it to redirect Reddit URLs to Teddit, Twitter URLs to Nitter, and Wikipedia mobile URLs to the normal Wikipedia site.

Stylus

Stylus allows custom CSS to be applied to websites. I use it to make websites less eye-burningly-bright. Dark Reader is another solution to this problem, but I found it to be somewhat resource intensive. Stylus lets me darken websites with no performance penalty.

Tree Style Tab

Tree Style Tab moves tabs from the default horizontal bar across the top of the browser chrome to a vertical sidebar, and allows the tabs to be placed into a nested tree-like hierarchy. In a recent-ish version of Firefox, Mozilla uglified the default horizontal tab bar. This was what finally pushed me into adopting tree style tabs. It took me a couple weeks to get used to it, but now I’m a convert. I wouldn’t want to use a browser without it. Unfortunately, the extension does seem to have a performance penalty. Not so much during normal use, but it definitely increases the time required to launch the browser. To me, it is worth it.

uBlock Origin

uBlock Origin blocks advertisements, malware, and other waste. This extension should need no introduction. The modern web is unusable without it. Until recently I used this in combination with uMatrix. I removed uMatrix when it was abandoned by the author, but was pleasantly surprised to find that current versions of uBlock by itself satisfies my needs in this department.

User-Agent Switcher

User-Agent Switcher allows the user-agent string to be changed. It seems odd that the user would need an extension to change the user-agent string in their user agent, but here we are. I mostly use this for testing things.

Vim Vixen

Vim Vixen allows the browser to be controlled using vim-like keys. Back in those halcyon days before Mozilla broke their extension system, I switched between two extensions called Vimperator and Pentadactyl to accomplish this. Those were both complete extensions that were able to improve every interaction point with the browser. Vim Vixen is an inferior experience, but seems to be the best current solution. It’s mostly alright.

Wallabagger

Wallabagger lets me save articles to my Wallabag instance with a single click.

Web Archives

Web Archives allows web pages to be looked up in various archives. I just use it for quick access to the Internet Archive’s Wayback Machine.

2020-08-03

toolchain, annex, finance, plaintextaccounting

Organizing Ledger

Ledger is a double-entry accounting system that stores data in plain text. I began using it in 2012. Almost every dollar that has passed through my world since then is tracked by Ledger.¹

Ledger is not the only plain text accounting system out there. It has inspired others, such as hledger and beancount. I began with Ledger for lack of a compelling argument in favor of the alternatives. After close to a decade of use, my only regret is that I didn’t start using earlier.

My Ledger repository is stored at ~/library/ledger. This repository contains a data directory, which includes yearly Ledger journal files such as data/2019.ldg and data/2020.ldg. Ledger files don’t necessarily need to be split at all, but I like having one file per year. In January, after I clear the last transaction from the previous year, I know the year is locked and the file never gets touched again (unless I go back in to rejigger my account structure).

The root of the directory has a .ledger file which includes all of these data files, plus a special journal file with periodic transactions that I sometimes use for budgeting. My ~/.ledgerrc file tells Ledger to use the .ledger file as the primary journal, which has the effect of including all the yearly files.

$ cat ~/.ledgerrc
--file ~/library/ledger/.ledger
--date-format=%Y-%m-%d

$ cat ~/library/ledger/.ledger
include data/periodic.ldg
include data/2012.ldg
include data/2013.ldg
include data/2014.ldg
include data/2015.ldg
include data/2016.ldg
include data/2017.ldg
include data/2018.ldg
include data/2019.ldg
include data/2020.ldg

Ledger’s include format does support globbing (ie include data/*.ldg) but the ordering of the transactions can get weird, so I prefer to be explicit.

The repository also contains receipts in the receipts directory, invoices in the invoices directory, scans of checks (remember those?) in the checks directory, and CSV dumps from banks in the dump directory.

$ tree -d ~/library/ledger
/home/pigmonkey/library/ledger
├── checks
├── data
├── dump
├── invoices
└── receipts

5 directories

The repository is managed using a mix of vanilla git and git-annex.² It is important to me that the Ledger journal files in the data directory are stored directly in git. I want the ability to diff changes before committing them, and to be able to pull the history of those files. Every other file I want stored in git-annex. I don’t care about the history of files like PDF receipts. They never change. In fact, I want to make them read-only so I can’t accidentally change them. I want encrypted versions of them distributed to my numerous special remotes for safekeeping, and someday I may even want to drop old receipts or invoices from my local store so that they don’t take up disk space until I actually need to read them. That sounds like asking a lot, but git-annex magically solves all the problems with its largefiles configuration option.

$ cat ~/library/ledger/.gitattributes
*.ldg annex.largefiles=nothing

This tells git-annex that any file ending with *.ldg should not be treated as a “large file”, which means it should be added directly to git. Any other file should be added to git-annex and locked, making it read-only. Having this configured means that I can just blindly git annex add . or git add . within the repository and git-annex will always do the right thing.

I don’t run the git-annex assistant in this repository because I don’t want any automatic commits. Like a traditional git repository, I only commit changes to Ledger’s journal files after reviewing the diffs, and I want those commits to have meaningful messages.

Notes

↵ I do not always track miscellaneous cash transactions less than $20. If a thing costs more than that, it is worth tracking, regardless of what it is or how it was purchased. If it costs less than that, and it isn't part of a meaningful expense account, I'll probably let laziness win out. If I buy a $8 sandwich for lunch with cash, it'll get logged, because I care about tracking dining expenses. If I buy a $1 pencil erasure, I probably won't log it, because it isn't part of an account worth considering.
↵ I bet you saw that coming.

This article was modified 2020-08-05.

2020-06-11

toolchain, linux, shell

Searching Books

ripgrep-all is a small wrapper around ripgrep that adds support for additional file formats.

I discovered it while looking for a program that would allow me to search my e-book library without needing to open individual books and search their contents via Calibre. ripgrep-all accomplishes this by using Pandoc to convert files to plain text and then running ripgrep on the output. One of the numerous formats supported by Pandoc is EPUB, which is the format I use to store books.

Running Pandoc on every book in my library to extract its text can take some time, but ripgrep-all caches the extracted text so that subsequent runs are similar in speed to simply searching plain text – which is blazing fast thanks to ripgrep’s speed. It takes around two seconds to search 1,706 books.

$ time(rga -li 'pandemic' ~/library/books/ | wc -l)
33

real    0m1.225s
user    0m2.458s
sys     0m1.759s

2020-01-31

osint, toolchain, shell

Monitoring Legible News

I was sent a link to Legible News last November by someone who had read my post on the now-defunct Breaking News. Legible News is a website that simply scrapes headlines from Wikipedia’s Current Events once per day and presents them in a legible format. This seems like a simple thing, but is far beyond the capabilities of most news organizations today.

Legible News provides no update notification mechanism. I addressed this by plugging it into my urlwatch system. Initially this presented two problems: the email notification included the HTML markup, which I didn’t care about, and it included both the old and new content of every changed line – effectively sending me the news from today and yesterday.

The first problem was easily solved by using the html2text filter provided by urlwatch. This strips out all markup, which is what I thought I wanted. I ran this for a bit before deciding that I did want the output to contain links. What I really wanted was some sort of html2markdown filter.

I also realized I did not just want to be sent new lines, but every line anytime there was a change. If the news yesterday included a section titled “Armed conflicts and attacks”, and the news today included a section with the same title, I wanted that in my output despite it not having changed.

I solved both of these problems using the diff_tool argument of urlwatch. This allows the user to pass in a special tool to replace the default use of diff to generate the notification output. The tool will be called with two arguments: the filename of the previously downloaded version of the URL and the filename of the current version. I wrote a simple script called html2markdown.sh which ignores the first argument and simply passes the second argument to Pandoc for formatting.

#!/bin/sh

pandoc --from html \
--to markdown_strict \
--reference-links \
--reference-location=block \
$2

This script is used as the diff_tool in the urlwatch job definition.

kind: url
name: Legible News
url: https://legiblenews.com/
diff_tool: /home/pigmonkey/bin/html2markdown.sh

The result is the latest version of Legible News, nicely converted to Markdown, delivered to my inbox every day. The output would be even better if Legible News used semantic markup – specifically heading elements – but it is perfectly serviceable as is.

After I built this I discovered that somebody had created an RSS feed for Legible News using a service called Feed43.

This article was modified 2020-06-11.