Mac os ocr on pdf file pdf#
(Question: Why don't I just let the ScanSnap software run the PDF though FineReader automatically? Answer: I don't feel like waiting for it.) If it didn't, and you still want to use the version of ABBYY FineReader that came with your ScanSnap, you'll need to use PDFInfo, PDFAuxInfo, or pdftk to set the PDF's Creator to "ScanSnap Manager". Since my version of ABBYY FineReader came bundled with my ScanSnap, this code assumes your freshly-scanned PDF originated from a Fujitsu ScanSnap scanner. Here's a shell script example: Code: Select all #! /bin/bash if ! grep Font "$1" then echo "This file needs to be OCRd" else echo "This file does not need to be OCRd" fi Even better, my first rule for one of my watched folders contains a shell script to automatically test a PDF to see if it needs to be OCR'd then OCRs it if necessary. I automated this capability by using the command "grep". This time, you should find words like "FontName", "BaseFont", and/or "Font". Then, run the PDF through your favorite OCR program, open in TextMate, then search again. To prove this to yourself, open a freshly scanned (non-OCR'd) PDF in TextMate, or your favorite text editor. By "text", I mean the binary-level code of the PDF, not the PDF text itself. Basically, if the text for a PDF file contains the word "Font", it's likely it's either been printed natively to a PDF or already been run through a OCR program, such as Adobe Acrobat, ABBYY FineReader, or the OCR capabilities of PDFpen(Pro). I recently discovered a way to determine if a PDF file contains a Optical Character Recogition (OCR'd) text layer, or even a native text layer, and thought this board might benefit.