pig-monkey.com - toolchainhttps://pig-monkey.com/2024-08-10T21:19:46-07:00In Which Graphic Novels Are Optimized for Portability2024-08-10T00:00:00-07:002024-08-10T21:19:46-07:00Pig Monkeytag:pig-monkey.com,2024-08-10:/2024/08/kindle-comic-converter/<p><a href="https://github.com/ciromattia/kcc">Kindle Comic Converter</a> is a program that optimizes comic book files for e-readers. I have not read many graphic novels in the past, but I think that is likely to change now that I have found a good workflow for consuming them digitally. The portability of my <a href="https://us.kobobooks.com/products/kobo-libra-2">Kobo Libra 2 …</a></p><p><a href="https://github.com/ciromattia/kcc">Kindle Comic Converter</a> is a program that optimizes comic book files for e-readers. I have not read many graphic novels in the past, but I think that is likely to change now that I have found a good workflow for consuming them digitally. The portability of my <a href="https://us.kobobooks.com/products/kobo-libra-2">Kobo Libra 2</a> makes it more convenient than <a href="/2019/01/portrait-rotate/">reading on my laptop</a>. Its 7” screen is large enough for me to enjoy comics when properly formatted, unlike the 6” screen of my old Kindle Paperwhite. (8” would probably be the best screen size for this type of content, but I am not sure that I would be pleased with the decrease in the packability of the device.)</p>
<p>I am using the command line version of KCC (surprising no one), typically as such:</p>
<div class="highlight"><pre><span></span><code>$ kcc-c2e --profile KoL --upscale --cropping <span class="m">2</span> --splitter <span class="m">2</span> input-file.cbr
</code></pre></div>
<ul>
<li><code>--profile KoL</code> specifies that the target device is my Kobo Libra 2. The program will optimize the output file for the resolution and color profile of this device.</li>
<li><code>--upscale</code> instructs the program to <a href="https://knowyourmeme.com/memes/zoom-and-enhance">enhance</a> images smaller than the device’s resolution.</li>
<li><code>--cropping 2</code> will attempt to crop out margins and page numbers.</li>
<li><code>--splitter 2</code> instructs the program to duplicate double page spreads. The spread will first be displayed as a single rotated page (so that I can see the whole image at once, as the illustrator intended), and then split into two pages (so that I can see details and read text without zooming). This sometimes makes poor decisions on filler pages – pages of credits, praise blurbs, etc – but it seems to always do the right thing when you’re in the pages of the comic itself.</li>
</ul>
<p>I import the original source file into <a href="/2018/11/ebooks/">my Calibre library</a>, and then add the KCC-generated EPUB as an additional file to the same book record. When loading the book onto my reader, I explicitly tell Calibre to send the EPUB. I do not allow Calibre to do any further conversion to this file.</p>
<p>The resulting files do not look great when viewed on my computer. The lack of margins from the <code>--cropping 2</code> flag is annoying, and the images look <a href="https://github.com/ciromattia/kcc/wiki/FAQ#images-inside-file-created-by-kcc-dont-look-very-well-on-my-pc-did-i-do-something-wrong">dark and jagged</a>. But on the E Ink screen they look great.</p>
<p>I used this process to read Craig Thompson’s <a href="https://en.wikipedia.org/wiki/Blankets_(comics)">Blankets</a>, which I <a href="https://en.wikipedia.org/wiki/Streisand_effect">learned</a> about thanks to <a href="https://www.sltrib.com/news/education/2024/08/02/utah-book-ban-list-these-titles/">Utah’s attempt to ban it</a>. This book was fantastic. For maximum teenage angst, I recommend reading it while listening to The Cure. (I don’t even especially like The Cure, but when I finished Blankets I was struck with the strange desire to spend the next day repeatedly listening to <a href="https://en.wikipedia.org/wiki/Kiss_Me,_Kiss_Me,_Kiss_Me">the</a> <a href="https://en.wikipedia.org/wiki/Disintegration_(The_Cure_album)">few</a> <a href="https://en.wikipedia.org/wiki/Wish_(The_Cure_album)">albums</a> of The Cure that I do own – so I did.) The minimalist, black-and-white art style of Blankets lends itself perfectly to a grayscale E Ink screen. I was impressed at how much emotion he can communicate with so few lines.</p>
<p><a href="https://www.flickr.com/photos/pigmonkey/53914380312/in/dateposted/" title="Blankets by Craig Thompson"><img src="https://live.staticflickr.com/65535/53914380312_f19d4b4b04_c.jpg" width="800" height="600" alt="Blankets by Craig Thompson"/></a></p>
<p>I have recently begun to read <a href="https://en.wikipedia.org/wiki/Monstress_(comics)">Monstress</a> by Marjorie Liu and Sana Takeda. This one is weird. (Utahns are going to lose their shit when they learn about it.) Unlike Blankets, Monstress is drawn with lush, full-color artwork. It also has a lot more text, whereas Blankets was much more about emotion than exposition. I read the first issue on the Libra 2, and then borrowed the dead-tree version from the library and reread it to compare. I think it still looks great on the grayscale E Ink screen – it gives the impression of being extremely detailed graphite pencil work – but the color does add a little something extra (gore, mostly). The text is legible without zooming, but on the small side. I am jumping back and forth between reading further issues on the Libra 2 and on color paper. The portability of the Libra 2 counts for a lot – I carry it with me every day – but I think Monstress is probably better consumed on paper – or digitally on a larger color screen.</p>
<p><a href="https://www.flickr.com/photos/pigmonkey/53915266761/in/dateposted/" title="Monstress by Marjorie Liu & Sana Takeda"><img src="https://live.staticflickr.com/65535/53915266761_391800ec2b_c.jpg" width="800" height="600" alt="Monstress by Marjorie Liu & Sana Takeda"/></a></p>Working with ACSM Files on Linux2024-07-03T00:00:00-07:002024-07-03T20:14:52-07:00Pig Monkeytag:pig-monkey.com,2024-07-03:/2024/07/libgourou/<p>I acquire books from various <a href="https://www.overdrive.com/">OverDrive</a> instances. OverDrive provides an <a href="https://en.wikipedia.org/wiki/Adobe_Content_Server">ACSM</a> file, which is not a book, but instead an XML ticket meant to be exchanged for the actual book file – similar to requesting a book in meatspace by turning in a catalog card to a librarian. <a href="https://www.adobe.com/solutions/ebook/digital-editions.html">Adobe Digital Editions …</a></p><p>I acquire books from various <a href="https://www.overdrive.com/">OverDrive</a> instances. OverDrive provides an <a href="https://en.wikipedia.org/wiki/Adobe_Content_Server">ACSM</a> file, which is not a book, but instead an XML ticket meant to be exchanged for the actual book file – similar to requesting a book in meatspace by turning in a catalog card to a librarian. <a href="https://www.adobe.com/solutions/ebook/digital-editions.html">Adobe Digital Editions</a> is used to perform this exchange. As one would expect from Adobe, this software does not support Linux.</p>
<p>Back in 2013 I setup a Windows 7 virtual machine with Adobe Digital Editions v2.0.1.78765, which I used exclusively for turning ACSM files into EPUB files. A few months ago I was finally able to retire that VM thanks to the discovery of <a href="https://forge.soutade.fr/soutade/libgourou/">libgourou</a>, which is both a library and a suite of utilities that can be used to work with ACSM files.</p>
<p>To use, I first register an anonymous account with Adobe.</p>
<div class="highlight"><pre><span></span><code>$ adept_activate -a
</code></pre></div>
<p>Next I export the private key that the files will be encrypted to.</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="w"> </span><span class="n">acsmdownloader</span><span class="w"> </span><span class="o">--</span><span class="k">export</span><span class="o">-</span><span class="n">private</span><span class="o">-</span><span class="n">key</span><span class="w"></span>
</code></pre></div>
<p>This key can then be imported into the <a href="https://github.com/noDRM/DeDRM_tools">DeDRM_tools</a> plugin of <a href="https://calibre-ebook.com/">Calibre</a>.</p>
<p>Whenever I receive an ACSM file, I can just pass it to the <code>acsmdownloader</code> utility from libgourou.</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="w"> </span><span class="n">acsmdownloader</span><span class="w"> </span><span class="o">-</span><span class="n">f</span><span class="w"> </span><span class="n">foobar</span><span class="o">.</span><span class="n">acsm</span><span class="w"></span>
</code></pre></div>
<p>This spits out the EPUB, which may be imported into <a href="/2018/11/ebooks/">my standard Calibre library</a>.</p>Redswitch2021-10-17T00:00:00-07:002021-10-17T10:35:49-07:00Pig Monkeytag:pig-monkey.com,2021-10-17:/2021/10/redswitch/<p><a href="http://jonls.dk/redshift/">Redshift</a> is a program that adjusts the color temperature of the screen based on time and location. It can automatically fetch one’s location via <a href="https://gitlab.freedesktop.org/geoclue/geoclue/-/wikis/home">GeoClue</a>. I’ve used it for years. It works most of the time. But, more often than I’d like, it fails to fetch my …</p><p><a href="http://jonls.dk/redshift/">Redshift</a> is a program that adjusts the color temperature of the screen based on time and location. It can automatically fetch one’s location via <a href="https://gitlab.freedesktop.org/geoclue/geoclue/-/wikis/home">GeoClue</a>. I’ve used it for years. It works most of the time. But, more often than I’d like, it fails to fetch my location from GeoClue. When this happens, I find GeoClue impossible to debug. Redshift <a href="https://github.com/jonls/redshift/issues/393">does not cache location information</a>, so when it fails to fetch my location the result is an eye-meltingly bright screen at night. To address this, I wrote a small shell script to avoid GeoClue entirely.</p>
<p><a href="https://github.com/pigmonkey/redswitch">Redswitch</a> fetches the current location via the <a href="https://location.services.mozilla.com/">Mozilla Location Service</a> (using GeoClue’s API key, which <a href="https://gitlab.freedesktop.org/geoclue/geoclue/-/issues/136">may go away</a>). The result is stored and compared against the previous location to determine if the device has moved. If a change in location is detected, Redshift is killed and relaunched with the new location (this will result in a noticeable flash, but there seems to be no alternative since <a href="https://github.com/jonls/redshift/pull/96">Redshift cannot reload its settings while running</a>). If Redshift is not running, it is launched. If no change in location is detected and Redshift is already running, nothing happens. Because the location information is stored, this can safely be used to launch Redshift when the machine is offline (or when the Mozilla Location Service API is down or rate-limited).</p>
<p>My laptop does not experience frequent, drastic changes in location. I find that having the script automatically execute once upon login is adequate for my needs. If you’re jetting around the world, you could periodically execute the script via cron or a systemd timer.</p>
<p>This solves all my problems with Redshift. I can go back to forgetting about its existence, which is my goal for software of this sort.</p>Browser Extensions2021-10-01T00:00:00-07:002021-10-01T21:02:46-07:00Pig Monkeytag:pig-monkey.com,2021-10-01:/2021/10/browser-extensions/<p>I try to keep the number of browser extensions I use to a minimum. The following are what I find necessary in <a href="https://www.firefox.com">Firefox</a>.</p>
<h2><a href="https://gitlab.com/KevinRoebert/ClearUrls">ClearURLS</a></h2>
<p>ClearURLs removes extra cruft from URLs. I don’t really a problem with things like UTM parameters. Such things seem reasonable to me. But, more broadly …</p><p>I try to keep the number of browser extensions I use to a minimum. The following are what I find necessary in <a href="https://www.firefox.com">Firefox</a>.</p>
<h2><a href="https://gitlab.com/KevinRoebert/ClearUrls">ClearURLS</a></h2>
<p>ClearURLs removes extra cruft from URLs. I don’t really a problem with things like UTM parameters. Such things seem reasonable to me. But, more broadly, digital advertising has proved itself hostile to my interests, so I choose to be hostile right back.</p>
<h2><a href="https://github.com/Cookie-AutoDelete/Cookie-AutoDelete">Cookie AutoDelete</a></h2>
<p>Cookie AutoDelete deletes cookies after a tab is closed or the domain changes. I whitelist cookies for some of the services I run, like my RSS reader, but every other cookie gets deleted 10 seconds after I leave the site. The extension can also manage other data stores, like IndexedDB and Local Storage.</p>
<h2><a href="https://code.guido-berhoerster.org/addons/firefox-addons/feed-preview/">Feed Preview</a></h2>
<p>Feed Preview adds an icon to the address bar when a page includes an RSS or Atom feed in its header. This used to be built in to Firefox, but for some inexplicable reason they removed it some years ago now. Removing the icon broke one of the core ways that I use a web browser. As the name suggests, the extension can also render a preview of the feed. I don’t use it for that. I just want my icon back.</p>
<h2><a href="https://github.com/mozilla/multi-account-containers">Firefox Multi-Account Containers</a></h2>
<p>Firefox Multi-Account Containers is a Mozilla provided extension to create different containers and assign domains to them. In modern web browser parlance, a container means isolated storage. So a cookie in container A is not visible within container B, and vice versa.</p>
<h2><a href="https://github.com/stoically/temporary-containers">Temporary Containers</a></h2>
<p>Temporary Containers is the real workhorse of my containment strategy. It generates a new, temporary container for every domain. It automatically deletes the containers it generates 5 minutes after the last tab in that container is closed. This effectively isolates all domains from one another.</p>
<h2><a href="https://github.com/Rayquaza01/HistoryCleaner">History Cleaner</a></h2>
<p>History Cleaner deletes browser history that is older than 200 days. History is useful, but if I haven’t visited a URL in more than 200 days, I probably no longer care about. Having all that cruft automatically cleaned out makes it easier to find what I’m looking for in the remaining history, and speeds up autocomplete in the address bar.</p>
<h2><a href="https://github.com/einaregilsson/Redirector">Redirector</a></h2>
<p>Redirector lets you create pattern-based URL redirects. I use it to redirect Reddit URLs to <a href="https://github.com/teddit-net/teddit">Teddit</a>, Twitter URLs to <a href="https://github.com/zedeus/nitter/">Nitter</a>, and Wikipedia mobile URLs to the normal Wikipedia site.</p>
<h2><a href="https://github.com/openstyles/stylus">Stylus</a></h2>
<p>Stylus allows custom CSS to be applied to websites. I use it to make websites less eye-burningly-bright. <a href="https://addons.mozilla.org/en-US/firefox/addon/darkreader/">Dark Reader</a> is another solution to this problem, but I found it to be somewhat resource intensive. Stylus lets me darken websites with no performance penalty.</p>
<h2><a href="https://github.com/piroor/treestyletab">Tree Style Tab</a></h2>
<p>Tree Style Tab moves tabs from the default horizontal bar across the top of the browser chrome to a vertical sidebar, and allows the tabs to be placed into a nested tree-like hierarchy. In a recent-ish version of Firefox, Mozilla uglified the default horizontal tab bar. This was what finally pushed me into adopting tree style tabs. It took me a couple weeks to get used to it, but now I’m a convert. I wouldn’t want to use a browser without it. Unfortunately, the extension does seem to have a performance penalty. Not so much during normal use, but it definitely increases the time required to launch the browser. To me, it is worth it.</p>
<h2><a href="https://github.com/gorhill/uBlock">uBlock Origin</a></h2>
<p>uBlock Origin blocks advertisements, malware, and other waste. This extension should need no introduction. The modern web is unusable without it. Until recently I used this in combination with <a href="https://github.com/gorhill/uMatrix">uMatrix</a>. I removed uMatrix when it was abandoned by the author, but was pleasantly surprised to find that current versions of uBlock by itself satisfies my needs in this department.</p>
<h2><a href="https://gitlab.com/ntninja/user-agent-switcher">User-Agent Switcher</a></h2>
<p>User-Agent Switcher allows the user-agent string to be changed. It seems odd that the user would need an extension to change the user-agent string in their user agent, but here we are. I mostly use this for testing things.</p>
<h2><a href="https://github.com/ueokande/vim-vixen">Vim Vixen</a></h2>
<p>Vim Vixen allows the browser to be controlled using vim-like keys. Back in those halcyon days before Mozilla broke their extension system, I switched between two extensions called Vimperator and Pentadactyl to accomplish this. Those were both complete extensions that were able to improve every interaction point with the browser. Vim Vixen is an inferior experience, but seems to be the best current solution. It’s mostly alright.</p>
<h2><a href="https://github.com/wallabag/wallabagger">Wallabagger</a></h2>
<p>Wallabagger lets me save articles to my <a href="https://wallabag.org/">Wallabag</a> instance with a single click.</p>
<h2><a href="https://github.com/dessant/web-archives">Web Archives</a></h2>
<p>Web Archives allows web pages to be looked up in various archives. I just use it for quick access to the <a href="http://web.archive.org/">Internet Archive’s Wayback Machine</a>.</p>Organizing Ledger2020-08-03T00:00:00-07:002020-08-05T15:17:03-07:00Pig Monkeytag:pig-monkey.com,2020-08-03:/2020/08/organizing-ledger/<p><a href="https://www.ledger-cli.org/">Ledger</a> is a <a href="https://en.wikipedia.org/wiki/Double-entry_bookkeeping">double-entry accounting system</a> that stores data in plain text. I began using it in 2012. Almost every dollar that has passed through my world since then is tracked by Ledger.<sup class="footnote-ref" id="fnref:cash"><a rel="footnote" href="#fn:cash" title="see footnote">1</a></sup></p>
<p>Ledger is not the only <a href="https://plaintextaccounting.org/">plain text accounting system</a> out there. It has inspired others, such …</p><p><a href="https://www.ledger-cli.org/">Ledger</a> is a <a href="https://en.wikipedia.org/wiki/Double-entry_bookkeeping">double-entry accounting system</a> that stores data in plain text. I began using it in 2012. Almost every dollar that has passed through my world since then is tracked by Ledger.<sup class="footnote-ref" id="fnref:cash"><a rel="footnote" href="#fn:cash" title="see footnote">1</a></sup></p>
<p>Ledger is not the only <a href="https://plaintextaccounting.org/">plain text accounting system</a> out there. It has inspired others, such as <a href="https://hledger.org/">hledger</a> and <a href="http://furius.ca/beancount/">beancount</a>. I began with Ledger for lack of a compelling argument in favor of the alternatives. After close to a decade of use, my only regret is that I didn’t start using earlier.</p>
<p>My Ledger repository is stored at <code>~/library/ledger</code>. This repository contains a <code>data</code> directory, which includes yearly Ledger journal files such as <code>data/2019.ldg</code> and <code>data/2020.ldg</code>. Ledger files don’t necessarily need to be split at all, but I like having one file per year. In January, after I clear the last transaction from the previous year, I know the year is locked and the file never gets touched again (unless I go back in to rejigger my account structure).</p>
<p>The root of the directory has a <code>.ledger</code> file which includes all of these data files, plus a special journal file with periodic transactions that I sometimes use for budgeting. My <code>~/.ledgerrc</code> file tells Ledger to use the <code>.ledger</code> file as the primary journal, which has the effect of including all the yearly files.</p>
<div class="highlight"><pre><span></span><code>$ cat ~/.ledgerrc
--file ~/library/ledger/.ledger
--date-format<span class="o">=</span>%Y-%m-%d
$ cat ~/library/ledger/.ledger
include data/periodic.ldg
include data/2012.ldg
include data/2013.ldg
include data/2014.ldg
include data/2015.ldg
include data/2016.ldg
include data/2017.ldg
include data/2018.ldg
include data/2019.ldg
include data/2020.ldg
</code></pre></div>
<p>Ledger’s include format does support globbing (ie <code>include data/*.ldg</code>) but the ordering of the transactions can get weird, so I prefer to be explicit.</p>
<p>The repository also contains receipts in the <code>receipts</code> directory, invoices in the <code>invoices</code> directory, scans of checks (remember those?) in the <code>checks</code> directory, and CSV dumps from banks in the <code>dump</code> directory.</p>
<div class="highlight"><pre><span></span><code>$ tree -d ~/library/ledger
/home/pigmonkey/library/ledger
├── checks
├── data
├── dump
├── invoices
└── receipts
<span class="m">5</span> directories
</code></pre></div>
<p>The repository is managed using a mix of vanilla git and <a href="https://git-annex.branchable.com/">git-annex</a>.<sup class="footnote-ref" id="fnref:annex"><a rel="footnote" href="#fn:annex" title="see footnote">2</a></sup> It is important to me that the Ledger journal files in the <code>data</code> directory are stored directly in git. I want the ability to diff changes before committing them, and to be able to pull the history of those files. Every other file I want stored in git-annex. I don’t care about the history of files like PDF receipts. They never change. In fact, I want to make them read-only so I can’t accidentally change them. I want encrypted versions of them distributed to my numerous <a href="/2016/08/rclone/">special remotes</a> for safekeeping, and someday I may even want to drop old receipts or invoices from my local store so that they don’t take up disk space until I actually need to read them. That sounds like asking a lot, but git-annex magically solves all the problems with its <a href="https://git-annex.branchable.com/tips/largefiles/"><code>largefiles</code> configuration option</a>.</p>
<div class="highlight"><pre><span></span><code>$ cat ~/library/ledger/.gitattributes
*.ldg annex.largefiles<span class="o">=</span>nothing
</code></pre></div>
<p>This tells git-annex that any file ending with <code>*.ldg</code> should not be treated as a “large file”, which means it should be added directly to git. Any other file should be added to git-annex and <a href="https://git-annex.branchable.com/git-annex-lock/">locked</a>, making it read-only. Having this configured means that I can just blindly <code>git annex add .</code> or <code>git add .</code> within the repository and git-annex will always do the right thing.</p>
<p>I don’t run the <a href="https://git-annex.branchable.com/assistant/">git-annex assistant</a> in this repository because I don’t want any automatic commits. Like a traditional git repository, I only commit changes to Ledger’s journal files after reviewing the diffs, and I want those commits to have meaningful messages.</p>
<div id="footnotes">
<h2>Notes</h2>
<ol>
<li id="fn:cash"><a rev="footnote" href="#fnref:cash" class="footnote-return" title="return to article">↵</a> I do not always track miscellaneous cash transactions less than $20. If a thing costs more than that, it is worth tracking, regardless of what it is or how it was purchased. If it costs less than that, and it isn't part of a meaningful expense account, I'll probably let laziness win out. If I buy a $8 sandwich for lunch with cash, it'll get logged, because I care about tracking dining expenses. If I buy a $1 pencil erasure, I probably won't log it, because it isn't part of an account worth considering.</li>
<li id="fn:annex"><a rev="footnote" href="#fnref:annex" class="footnote-return" title="return to article">↵</a> I bet you <a href="/tag/annex/">saw that coming</a>.</li>
</ol>
</div>Searching Books2020-06-11T00:00:00-07:002020-06-11T21:30:50-07:00Pig Monkeytag:pig-monkey.com,2020-06-11:/2020/06/ripgrep-all/<p><a href="https://github.com/phiresky/ripgrep-all/">ripgrep-all</a> is a small wrapper around <a href="https://github.com/BurntSushi/ripgrep">ripgrep</a> that adds support for additional file formats.</p>
<p>I discovered it while looking for a program that would allow me to search <a href="/2018/11/ebooks/">my e-book library</a> without needing to open individual books and search their contents via <a href="https://calibre-ebook.com/">Calibre</a>. ripgrep-all accomplishes this by using <a href="https://pandoc.org/">Pandoc</a> to …</p><p><a href="https://github.com/phiresky/ripgrep-all/">ripgrep-all</a> is a small wrapper around <a href="https://github.com/BurntSushi/ripgrep">ripgrep</a> that adds support for additional file formats.</p>
<p>I discovered it while looking for a program that would allow me to search <a href="/2018/11/ebooks/">my e-book library</a> without needing to open individual books and search their contents via <a href="https://calibre-ebook.com/">Calibre</a>. ripgrep-all accomplishes this by using <a href="https://pandoc.org/">Pandoc</a> to convert files to plain text and then running ripgrep on the output. One of the numerous formats supported by Pandoc is <a href="https://en.wikipedia.org/wiki/EPUB">EPUB</a>, which is the format I use to store books.</p>
<p>Running Pandoc on every book in my library to extract its text can take some time, but ripgrep-all caches the extracted text so that subsequent runs are similar in speed to simply searching plain text – which is blazing fast thanks to ripgrep’s speed. It takes around two seconds to search 1,706 books.</p>
<div class="highlight"><pre><span></span><code>$ time<span class="o">(</span>rga -li <span class="s1">'pandemic'</span> ~/library/books/ <span class="p">|</span> wc -l<span class="o">)</span>
<span class="m">33</span>
real 0m1.225s
user 0m2.458s
sys 0m1.759s
</code></pre></div>Monitoring Legible News2020-01-31T00:00:00-08:002020-06-11T21:14:25-07:00Pig Monkeytag:pig-monkey.com,2020-01-31:/2020/01/monitoring-legible-news/<p>I was sent a link to <a href="https://legiblenews.com/">Legible News</a> last November by someone who had read my post on the now-defunct <a href="/2016/07/breaking/">Breaking News</a>. Legible News is a website that simply scrapes headlines from <a href="https://en.wikipedia.org/w/index.php?title=Portal:Current_events">Wikipedia’s Current Events</a> once per day and presents them in a legible format. This seems like a …</p><p>I was sent a link to <a href="https://legiblenews.com/">Legible News</a> last November by someone who had read my post on the now-defunct <a href="/2016/07/breaking/">Breaking News</a>. Legible News is a website that simply scrapes headlines from <a href="https://en.wikipedia.org/w/index.php?title=Portal:Current_events">Wikipedia’s Current Events</a> once per day and presents them in a legible format. This seems like a simple thing, but is <a href="https://zainamro.com/notes/unbearable-news">far beyond the capabilities of most news organizations today</a>.</p>
<p>Legible News provides no update notification mechanism. I addressed this by plugging it into <a href="/2019/12/urlwatch/">my urlwatch system</a>. Initially this presented two problems: the email notification included the HTML markup, which I didn’t care about, and it included both the old and new content of every changed line – effectively sending me the news from today and yesterday.</p>
<p>The first problem was easily solved by using the <code>html2text</code> filter provided by <a href="https://github.com/thp/urlwatch">urlwatch</a>. This strips out all markup, which is what I thought I wanted. I ran this for a bit before deciding that I did want the output to contain links. What I really wanted was some sort of <code>html2markdown</code> filter.</p>
<p>I also realized I did not just want to be sent new lines, but every line anytime there was a change. If the news yesterday included a section titled “Armed conflicts and attacks”, and the news today included a section with the same title, I wanted that in my output despite it not having changed.</p>
<p>I solved both of these problems using the <code>diff_tool</code> argument of urlwatch. This allows the user to pass in a special tool to replace the default use of <code>diff</code> to generate the notification output. The tool will be called with two arguments: the filename of the previously downloaded version of the URL and the filename of the current version. I wrote a simple script called <code>html2markdown.sh</code> which ignores the first argument and simply passes the second argument to <a href="https://pandoc.org">Pandoc</a> for formatting. </p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/bin/sh</span>
pandoc --from html <span class="se">\</span>
--to markdown_strict <span class="se">\</span>
--reference-links <span class="se">\</span>
--reference-location<span class="o">=</span>block <span class="se">\</span>
<span class="nv">$2</span>
</code></pre></div></td></tr></table></div>
<p>This script is used as the <code>diff_tool</code> in the urlwatch job definition.</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">url</span><span class="w"></span>
<span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Legible News</span><span class="w"></span>
<span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://legiblenews.com/</span><span class="w"></span>
<span class="nt">diff_tool</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/home/pigmonkey/bin/html2markdown.sh</span><span class="w"></span>
</code></pre></div></td></tr></table></div>
<p>The result is the latest version of Legible News, nicely converted to Markdown, delivered to my inbox every day. The output would be even better if Legible News used semantic markup – specifically heading elements – but it is perfectly serviceable as is.</p>
<p>After I built this I discovered that <a href="https://feed43.com/3068865104604836.xml">somebody had created an RSS feed for Legible News</a> using a service called <a href="http://feed43.com/">Feed43</a>.</p>I use FeedIron to repair neutered RSS feeds.2019-12-17T00:00:00-08:002019-12-17T19:38:23-08:00Pig Monkeytag:pig-monkey.com,2019-12-17:/2019/12/feediron/<p><a href="https://github.com/feediron/ttrss_plugin-feediron/">FeedIron</a> is a plugin for my feed reader, <a href="https://tt-rss.org/">Tiny Tiny RSS</a>. It takes broken, partial feeds and extracts the full article content, allowing me to read the article in my feed reader the way god intended. The plugin can be configured to extract content using <a href="https://github.com/feediron/ttrss_plugin-feediron/#filters">a number of filters</a>. I …</p><p><a href="https://github.com/feediron/ttrss_plugin-feediron/">FeedIron</a> is a plugin for my feed reader, <a href="https://tt-rss.org/">Tiny Tiny RSS</a>. It takes broken, partial feeds and extracts the full article content, allowing me to read the article in my feed reader the way god intended. The plugin can be configured to extract content using <a href="https://github.com/feediron/ttrss_plugin-feediron/#filters">a number of filters</a>. I find that using <a href="https://github.com/feediron/ttrss_plugin-feediron/tree/master/filters/fi_mod_xpath">the xpath filter</a> to specify an element on the page like <code>div[@class='entry-content']</code> corrects most neutered feeds.</p>I use urlwatch to monitor the global information super highway.2019-12-15T00:00:00-08:002019-12-15T21:37:24-08:00Pig Monkeytag:pig-monkey.com,2019-12-15:/2019/12/urlwatch/<p><a href="https://github.com/thp/urlwatch/">urlwatch</a> is a simple program that monitors a list of URLs and sends an alert when it detects a change. It can be configured to only look for changes within certain HTML elements, or to grep for certain strings. I configure it to send me the changes via email. As …</p><p><a href="https://github.com/thp/urlwatch/">urlwatch</a> is a simple program that monitors a list of URLs and sends an alert when it detects a change. It can be configured to only look for changes within certain HTML elements, or to grep for certain strings. I configure it to send me the changes via email. As with <a href="/2019/10/rss-bridge/">RSS-Bridge</a>, this tool is part of my strategy to liberate content from toxic silos and Make the Internet Great Again™.</p>Personal Information Management2019-12-14T00:00:00-08:002019-12-14T18:22:36-08:00Pig Monkeytag:pig-monkey.com,2019-12-14:/2019/12/pim-utils/<p><a href="https://pimutils.org/">pimutils</a> is a collection of software for personal information management. The core piece is <a href="https://vdirsyncer.pimutils.org/">vdirsyncer</a>, which synchronizes calendars and contacts between the local filesystem and CalDav and CardDAV servers. Calendars may then be interacted with via <a href="https://lostpackets.de/khal/">khal</a>, and contacts via <a href="https://github.com/scheibler/khard/">khard</a>. There’s not much to say about these three …</p><p><a href="https://pimutils.org/">pimutils</a> is a collection of software for personal information management. The core piece is <a href="https://vdirsyncer.pimutils.org/">vdirsyncer</a>, which synchronizes calendars and contacts between the local filesystem and CalDav and CardDAV servers. Calendars may then be interacted with via <a href="https://lostpackets.de/khal/">khal</a>, and contacts via <a href="https://github.com/scheibler/khard/">khard</a>. There’s not much to say about these three programs, other than they all just work. Having offline access to my calendars and contacts is critical, as is the ability to synchronize that data across machines.</p>
<p>Khard integrates easily with <a href="https://neomutt.org/">mutt</a> to provide autocomplete when composing emails. I find its interface for creating, editing and reading contacts to be intuitive. It can also output a calendar of birthdays, which can then be imported into khal.</p>
<p>Khal’s interface for adding new calendar events is much simpler and quicker than all the mousing required by GUI calendar programs.</p>
<div class="highlight"><pre><span></span><code>$ khal new <span class="m">2019</span>-11-16 <span class="m">21</span>:30 5h Alessandro Cortini at Public Works :: <span class="m">161</span> Erie St
</code></pre></div>
<p>There are times when a more complex user interface makes calendaring tasks easier. For this Khal offers the <a href="https://lostpackets.de/khal/usage.html#interactive">interactive option</a>, which provides a <a href="https://en.wikipedia.org/wiki/Text-based_user_interface">TUI</a> for creating, editing and reading events.</p>
<p>Khal can also <a href="https://lostpackets.de/khal/usage.html#import">import</a> <a href="https://en.wikipedia.org/wiki/ICalendar">iCalendar</a> files, which is a simple way of getting existing events into my world.</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span> <span class="n">khal</span> <span class="kn">import</span> <span class="nn">invite.ics</span>
</code></pre></div>
<p>Vdirsyncer has <a href="https://github.com/pimutils/vdirsyncer/issues/790">maintenance problems</a> that may call its future into question, but the whole point of modular tools that operate on open data formats is that they are replaceable.</p>
<p>I have a simple and often used script which calls <code>khal calendar</code> and <code>task list</code> (the latter command being <a href="https://taskwarrior.org/">taskwarrior</a>), answering the question: what am I supposed to be doing right now?</p>Terminal Calculations2019-12-12T00:00:00-08:002019-12-12T18:52:46-08:00Pig Monkeytag:pig-monkey.com,2019-12-12:/2019/12/qalculate/<p><a href="https://qalculate.github.io/">Qalculate!</a> is a well known GTK-based GUI calculator. For years I ignored it because I failed to realize that it included a terminal interface, <code>qalc</code>. Since learning about <code>qalc</code> last year it has become my go-to calculator. It supports <a href="https://qalculate.github.io/features.html">all the same features</a> as the GUI, including <a href="https://en.wikipedia.org/wiki/Reverse_Polish_notation">RPN</a> and unit …</p><p><a href="https://qalculate.github.io/">Qalculate!</a> is a well known GTK-based GUI calculator. For years I ignored it because I failed to realize that it included a terminal interface, <code>qalc</code>. Since learning about <code>qalc</code> last year it has become my go-to calculator. It supports <a href="https://qalculate.github.io/features.html">all the same features</a> as the GUI, including <a href="https://en.wikipedia.org/wiki/Reverse_Polish_notation">RPN</a> and unit conversions. I <a href="/2019/07/gnu-units/">primarily use GNU Units for unit wrangling</a>, but being able to perform unit conversions within my calculator is sometimes useful.</p>
<div class="highlight"><pre><span></span><code>$ qalc
> 1EUR to USD
It has been <span class="m">20</span> day<span class="o">(</span>s<span class="o">)</span> since the exchange rates last were updated
Do you wish to update the exchange rates now? y
<span class="m">1</span> * <span class="nv">euro</span> <span class="o">=</span> approx. <span class="nv">$1</span>.1137000
> 32oC to oF
<span class="m">32</span> * <span class="nv">celsius</span> <span class="o">=</span> <span class="m">89</span>.6 oF
</code></pre></div>
<p><a href="https://qalculate.github.io/manual/qalculate-mode.html#qalculate-rpn">The RPN mode</a> is not quite as intuitive as a purpose built RPN calculator like <a href="https://github.com/pelzlpj/orpie">Orpie</a>, but it is adequate for my uses. My most frequent use of RPN mode is totaling a long list of numbers without bothering with all those tedious <code>+</code> symbols.</p>
<div class="highlight"><pre><span></span><code>> rpn on
> stack
The RPN stack is empty
> 85
85 = 85
> 42
42 = 42
> 198
198 = 198
> 5
5 = 5
> 659
659 = 659
> stack
1: 659
2: 5
3: 198
4: 42
5: 85
> total
total([659, 5, 198, 42, 85]) = 989
> stack
1: 989
</code></pre></div>
<p>Also provided are some basic <a href="https://qalculate.github.io/manual/qalculate-definitions-functions.html#qalculate-definitions-functions-1-Statistics">statistics functions</a> that can help save time.</p>
<div class="highlight"><pre><span></span><code>> mean(2,12,5,3,1)
mean([2, 12, 5, 3, 1]) = 4.6
</code></pre></div>
<p>And of course there are <a href="https://qalculate.github.io/manual/qalculate-definitions-variables.html">the varaibles and constants you would expect</a></p>
<div class="highlight"><pre><span></span><code>> 12+3*8)/2
(12 + (3 * 8)) / 2 = 18
> ans*pi
ans * pi = 56.548668
</code></pre></div>
<p>I reach for <code>qalc</code> more frequently than alternative calculators like <a href="https://www.gnu.org/software/bc/">bc</a>, <a href="https://github.com/sharkdp/insect">insect</a>, or the Python shell.</p>I use Blokada to reduce the amount of advertisements on my telephone.2019-11-04T00:00:00-08:002019-11-04T19:01:31-08:00Pig Monkeytag:pig-monkey.com,2019-11-04:/2019/11/blokada/<p><a href="https://blokada.org/">Blokada</a> registers itself as a VPN service on the phone so that it can intercept all network traffic. It then downloads filter lists to route the domains of known advertisers, trackers, etc to a black hole, exactly like what I do on my real computer with <a href="https://github.com/pigmonkey/hostsctl">hostsctl</a>. For me it …</p><p><a href="https://blokada.org/">Blokada</a> registers itself as a VPN service on the phone so that it can intercept all network traffic. It then downloads filter lists to route the domains of known advertisers, trackers, etc to a black hole, exactly like what I do on my real computer with <a href="https://github.com/pigmonkey/hostsctl">hostsctl</a>. For me it has had no noticeable impact on battery life. I have found it especially useful when travelling internationally and purchasing cellular plans with small data caps. The only disadvantage I have found is that Blokada must be disabled when I want to connect to a real VPN via WireGuard or OpenVPN.</p>
<p>Blokada must be installed <a href="https://f-droid.org/en/packages/org.blokada.alarm/">via F-Droid</a> (or directly through the APK) because Google frowns upon blocking advertisements (but at least Google allows you to install software on your telephone outside of their walled garden, <a href="https://en.wikipedia.org/wiki/HKmap.live#iOS_app">unlike their competitor</a>).</p>I use RSS-Bridge to stitch together the balkanized web.2019-10-14T00:00:00-07:002019-11-04T19:01:18-08:00Pig Monkeytag:pig-monkey.com,2019-10-14:/2019/10/rss-bridge/<p><a href="https://github.com/RSS-Bridge/rss-bridge">RSS-Bridge</a> is an open-source project that liberates content from toxic walled gardens, allowing it to be shared and syndicated in my feed reader. The project can generate RSS or Atom feeds for <a href="https://github.com/RSS-Bridge/rss-bridge/tree/master/bridges">a number of sites</a>. It let’s me pretend that we live in a better time.</p>Mutt is my mail user agent of choice.2019-07-31T00:00:00-07:002019-07-31T21:12:06-07:00Pig Monkeytag:pig-monkey.com,2019-07-31:/2019/07/mutt-ics/<p>More specifically, <a href="https://neomutt.org/">Neomutt</a>. <a href="https://github.com/dmedvinsky/mutt-ics">Mutt ICS</a> is a python script which takes an <a href="https://en.wikipedia.org/wiki/ICalendar">iCalendar file</a> and outputs the contents in a human friendly format. I use it in <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/mutt/mailcap">my mailcap</a> so that I can see details of calendar attachments when reading email. It’s a simple script that improves my email …</p><p>More specifically, <a href="https://neomutt.org/">Neomutt</a>. <a href="https://github.com/dmedvinsky/mutt-ics">Mutt ICS</a> is a python script which takes an <a href="https://en.wikipedia.org/wiki/ICalendar">iCalendar file</a> and outputs the contents in a human friendly format. I use it in <a href="https://github.com/pigmonkey/dotfiles/blob/master/config/mutt/mailcap">my mailcap</a> so that I can see details of calendar attachments when reading email. It’s a simple script that improves my email workflow.</p>Without an OCR layer, PDF files are of limited use.2019-07-30T00:00:00-07:002019-07-30T19:47:30-07:00Pig Monkeytag:pig-monkey.com,2019-07-30:/2019/07/ocrmypdf/<p><a href="https://ocrmypdf.readthedocs.io/">OCRmyPDF</a> is a tool that applies <a href="https://en.wikipedia.org/wiki/Optical_character_recognition">optical character recognition</a> to PDFs. It uses <a href="https://github.com/tesseract-ocr/tesseract">Tesseract</a> to perform the OCR, and <a href="https://github.com/Flameeyes/unpaper">unpaper</a> to clean, deskew and optimize the input files. It outputs <a href="https://en.wikipedia.org/?title=PDF/A">PDF/A</a> files, optimized for long-term storage. This isn’t a tool I use frequently, but it is one I …</p><p><a href="https://ocrmypdf.readthedocs.io/">OCRmyPDF</a> is a tool that applies <a href="https://en.wikipedia.org/wiki/Optical_character_recognition">optical character recognition</a> to PDFs. It uses <a href="https://github.com/tesseract-ocr/tesseract">Tesseract</a> to perform the OCR, and <a href="https://github.com/Flameeyes/unpaper">unpaper</a> to clean, deskew and optimize the input files. It outputs <a href="https://en.wikipedia.org/?title=PDF/A">PDF/A</a> files, optimized for long-term storage. This isn’t a tool I use frequently, but it is one I greatly appreciate having when I need it. If you ever find yourself scanning or photographing documents, you want OCRmyPDF.</p>Date Manipulation2019-07-29T00:00:00-07:002019-07-29T19:02:29-07:00Pig Monkeytag:pig-monkey.com,2019-07-29:/2019/07/dateutils/<p><a href="http://www.fresse.org/dateutils/">Dateutils</a> is a collection of tools for the quick manipulation of dates. The tool I use most frequently is <code>datediff</code>. This program answers questions like: “How many days has it been since a date?” or “How many days are left in summer?”</p>
<div class="highlight"><pre><span></span><code>$ datediff <span class="m">2019</span>-03-21 now
<span class="m">131</span>
$ datediff now <span class="m">2019 …</span></code></pre></div><p><a href="http://www.fresse.org/dateutils/">Dateutils</a> is a collection of tools for the quick manipulation of dates. The tool I use most frequently is <code>datediff</code>. This program answers questions like: “How many days has it been since a date?” or “How many days are left in summer?”</p>
<div class="highlight"><pre><span></span><code>$ datediff <span class="m">2019</span>-03-21 now
<span class="m">131</span>
$ datediff now <span class="m">2019</span>-09-23
<span class="m">55</span>
</code></pre></div>
<p>My second most frequently used program is <code>dateadd</code>, which is used to add a duration to a date. It can answer questions like: “What will the date be in 3 weeks?”</p>
<div class="highlight"><pre><span></span><code>$ dateadd now +3w
<span class="m">2019</span>-08-20T02:02:23
</code></pre></div>
<p>The tools are much more powerful than these examples, but hardly a week goes by when I don’t use <code>datediff</code> or <code>dateadd</code> for simple tasks like this.</p>