Book Scanning

As previously mentioned, I like e-books. Unfortunately many books are still only available as dead trees. Fortunately, the internet provides book scanning services.

These services will professionally scan a book and run the images through an OCR program. The output is usually a PDF. This is a poor format for something like a novel, where you want the text to be able to dynamically reformat itself and flow across pages, but it remains a good choice for technical and reference books, where the layout of the page tends to be fixed around things like tables and graphs.

A couple years ago I tried two book scanning services: Custom Book Scanning and 1DollarScan. Both offer destructive book scanning services, meaning they cut the spine off of the book to ensure a well orientated scan. The output from both services was similar, but since that first trial I’ve come back to Custom Book Scanning rather than 1DollarScan. I appreciate that they perform the scan at 1200 dpi, which is higher than necessary for text but can be useful for documents that include photographs. In addition to the customary PDF, they also include a Microsoft Word document, and will provide e-book formats such as EPUB and MOBI for additional cost.

In my experience the OCR performed on these scans is completely adequate for searchability, which is my main requirement for the scans to be useful. It is not good enough to output something in EPUB or MOBI. Don’t expect to run pdftotext on the document and extract anything that does not require heavy editing by a human, but you’ll certainly be able to point pdfgrep at the file and get useful output.

As an example, here is a PDF extract of the first few pages of Botany in a Day by Thomas J Elpel (7MB). It demonstrates the sort of output one can expect from these services. The full book, with all of its figures and color drawings is 155MB. Botany in a Day is also exemplary of the type of book I find it worthwhile to scan. It’s a book I first read years ago and will probably never read again cover-to-cover, but it has remained on my bookshelf for over a decade because it is an occasionally useful reference. It is worth keeping around, and a digital copy makes it even more valuable: it can be searched, and easily carried with no space or weight penalty.

So far I have not actually sent any of my books in to be scanned. Instead, I’ve purchase new – that is to say, new to me – copies of the books online and have them shipped directly to the scanner. Used books in like-new condition can generally be found fairly cheaply. In the case of reference books, this has often let me upgrade to a newer edition than the one that I previously owned (such was the case with Botany in a Day). But mostly this is just so that I get a clean scan, without worrying about any notes or dog-eared pages that I may have in my old copies. After I receive the PDF, I give away my old hard copy.

Scanning has allowed me to reduce my physical book collection more than would otherwise be possible. I still own books that have yet to be published digitally and that don’t lend themselves to scanning – I am patiently waiting for whatever luddite owns the publishing rights to AB Guthrie Jr to produce digital versions of his books, as I have no expectation that OCR would be able to deal with the mountain man slang – but I’m glad to have these services available.