pig-monkey.com

Without an OCR layer, PDF files are of limited use.

OCRmyPDF is a tool that applies optical character recognition to PDFs. It uses Tesseract to perform the OCR, and unpaper to clean, deskew and optimize the input files. It outputs PDF/A files, optimized for long-term storage. This isn’t a tool I use frequently, but it is one I greatly appreciate having when I need it. If you ever find yourself scanning or photographing documents, you want OCRmyPDF.