pig-monkey.com

You are currently viewing all posts tagged with toolchain.

Without an OCR layer, PDF files are of limited use.

OCRmyPDF is a tool that applies optical character recognition to PDFs. It uses Tesseract to perform the OCR, and unpaper to clean, deskew and optimize the input files. It outputs PDF/A files, optimized for long-term storage. This isn’t a tool I use frequently, but it is one I greatly appreciate having when I need it. If you ever find yourself scanning or photographing documents, you want OCRmyPDF.

Date Manipulation

Dateutils is a collection of tools for the quick manipulation of dates. The tool I use most frequently is datediff. This program answers questions like: “How many days has it been since a date?” or “How many days are left in summer?”

$ datediff 2019-03-21 now
131
$ datediff now 2019-09-23
55

My second most frequently used program is dateadd, which is used to add a duration to a date. It can answer questions like: “What will the date be in 3 weeks?”

$ dateadd now +3w
2019-08-20T02:02:23

The tools are much more powerful than these examples, but hardly a week goes by when I don’t use datediff or dateadd for simple tasks like this.

Unit Wrangling

I use GNU Units to convert measurements.

The program knows about many obscure and antiquated units, but I mostly use it for boring things like converting currencies and between metric and imperial units. It can be used directly from the command line, or via a prompted interactive mode.

$ units 57EUR USD
        * 63.526262
        / 0.015741521

$ units
Currency exchange rates from FloatRates (USD base) on 2019-07-24
3460 units, 109 prefixes, 109 nonlinear units

You have: 16 floz
You want: ml
        * 473.17647
        / 0.0021133764
You have: tempC(30)
You want: tempF
        86

GNU Units is picky about its unit definitions, and they are case sensitive. For example, it knows what USD is, but usd is undefined. It supports tab completion of units in interactive mode, which can be helpful. It knows the difference between a US fluid ounce and a British fluid ounce.

$ units "1 usfloz" ml
        * 29.57353
        / 0.033814023

$ units "1 brfloz" ml
        * 28.413063
        / 0.03519508

The unit definitions are stored at /usr/share/units/definitions.units. Occasionally I’ll need to peruse through this file to find the correct formatting for the unit I’m interested in. Sometimes when doing this I’ll run into one of the more obscure definitions, such as beespace. Apparently this unit is used in beekeeping when designing hive boxes. It is described in the definition file thusly: “Bees will fill any space that is smaller than the bee space and leave open spaces that are larger. The size of the space varies with species.”

$ units 12inches beespace
        * 48
        / 0.020833333

Every so often you need to know how many Earth days are in one Martian year. With GNU Units that information is a few keystrokes away.

$ units 1marsyear days
        * 686.97959
        / 0.0014556473

Currency definitions are stored in /var/lib/units/currency.units. They are updated using the units_cur program. In the past I would update currencies whenever I needed them, but recently I setup a systemd timer to update these definitions roughly once per day (depending on network connectivity). This provides me with conversion rates that are current enough for my own use, which I can take advantage of even when offline, and does not require me to let a third party know which currencies or quantities I am interested in.

Astute readers will have noted that I am big on this offline computing thing.

Undertime

Undertime is a simple program that assists in coordinating events across time zones. It prints a table of your system’s local time zone, along with other any other specified zones. The output is colorized based on the start and end hour of the working day. If you want to talk to someone in Paris tomorrow, and you want the conversation to happen at an hour that is reasonable for both parties, Undertime can help.

Undertime Paris Meeting Example

I often find myself converting between local time and UTC. Usually this happens when working with system logs. If I have a specific date and time I want to translate, I’ll use date.

# Convert a time from PDT to UTC:
$ env TZ="UTC" date -d "2016-03-25T11:33 PDT"
# Convert a time from UTC to local:
$ date -d '2016-03-24T12:00 UTC'

If I’m not looking to convert an exact time, but just want to answer a more generalized question like “Approximately when was 14:00 UTC?” without doing the mental math, I find that Undertime is the quickest solution.

$ undertime UTC
╔═══════╦═══════╗
║  PDT  ║  UTC  ║
╠═══════╬═══════╣
║ 00:00 ║ 07:00 ║
║ 01:00 ║ 08:00 ║
║ 02:00 ║ 09:00 ║
║ 03:00 ║ 10:00 ║
║ 04:00 ║ 11:00 ║
║ 05:00 ║ 12:00 ║
║ 06:00 ║ 13:00 ║
║ 07:00 ║ 14:00 ║
║ 08:00 ║ 15:00 ║
║ 09:00 ║ 16:00 ║
║ 10:00 ║ 17:00 ║
║ 11:00 ║ 18:00 ║
║ 12:00 ║ 19:00 ║
║ 13:00 ║ 20:00 ║
║ 14:00 ║ 21:00 ║
║ 15:00 ║ 22:00 ║
║ 16:00 ║ 23:00 ║
║ 17:00 ║ 00:00 ║
║ 18:00 ║ 01:00 ║
║ 19:00 ║ 02:00 ║
║ 19:04 ║ 02:04 ║
║ 20:00 ║ 03:00 ║
║ 21:00 ║ 04:00 ║
║ 22:00 ║ 05:00 ║
║ 23:00 ║ 06:00 ║
╚═══════╩═══════╝
Table generated for time: 2019-07-23 19:04:00-07:00

Music Organization with Beets

I organize my music with Beets.

Beets imports music into my library, warns me if I’m missing tracks, identifies tracks based on their accoustic fingerprint, scrubs extraneous metadata, fetches and stores album art, cleans genres, fetches lyrics, and – most importantly – fetches metadata from MusicBrainz. After some basic configuration, all of this happens automatically when I import new files into my library.

After the files have been imported, beets makes it easy to query my library based on any of the clean, consistent, high quality, crowd-sourced metadata.

$ beet stats genre:ambient
Tracks: 649
Total time: 2.7 days
Approximate total size: 22.4 GiB
Artists: 76
Albums: 53
Album artists: 34

$ beet ls -a 'added:2019-07-01..'
Deathcount in Silicon Valley - Acheron
Dlareme - Compass
The Higher Intelligence Agency & Biosphere - Polar Sequences
JK/47 - Tokyo Empires
Matt Morton - Apollo 11 Soundtrack

$ beet ls -ap albumartist:joplin
/home/pigmonkey/library/audio/music/Janis Joplin/Full Tilt Boogie
/home/pigmonkey/library/audio/music/Janis Joplin/I Got Dem Ol' Kozmic Blues Again Mama!

As regular readers will have surmised, the files themselves are stored in git-annex.

Terminal Countdown

Termdown is a program that provides a countdown timer and stopwatch in the terminal. It uses FIGlet for its display. Its most attractive feature, I think, is the ability to support arbitrary script execution.

I use it most often as a countdown timer. One of my frequent applications is as a meditation timer. For this I want a 11 minute timer, with an alert at 10.5 minutes, 60 seconds, and 1 second. This gives me a 10 minute session with 30 seconds preparation and 30 seconds to return. Termdown makes this easy.

$ termdown --exec-cmd "case {0} in 630|30) mpv ~/library/audio/sounds/bell.mp3;; 1) mpv ~/library/audio/sounds/ring.mp3;; esac" 11m

An Offline Lexicon

dictd is a dictionary database server and client. It can be used to lookup word definitions over a network. I don’t use it for that. I use the program to provide an offline dictionary. Depending on a network connection, web browser and third-party websites just to define a word strikes me as dumb.

To make this go, dictionary files must be installed. I use the GNU Collaborative International Dictionary of English (GCIDE), WordNet, and the Moby Thesaurus. The GCIDE is derived from Noah Webster’s famous American dictionary. WordNet is a more modern (one might say “dry”) resource. The Moby Thesaurus is a public domain thesaurus originally built by Grady Ward. Between these three sources I can have a pretty good grasp on the English language. No network connectivity required.

I use a shell alias to always pipe the definitions through less.

def () {
    dict $1 | less
}