There are four different types of files pertinent to this discussion:

  • Image-only PDF This is an image created by a scanner.
  • Rendered PDF This is a PDF created by a computer (e.g. a Word document converted to PDF). It contains computer readable text by default, so it is fully text searchable as is without the need for OCR.
  • Hybrid PDF This is a PDF that contains both images and rendered content or annotations. If scanning a document, then use Adobe's markup tools, for example, to annotate the image, the PDF will be a hybrid PDF.
  • Image + text PDF This is a PDF that is created when the OCR engine 'reads' an image-only PDF and adds a layer of invisible, computer readable text to the original image. These files retain the exact original image, but also provide the ability to perform context sensitive search for text inside the PDF, as well as copying text to the Windows clipboard.

What Symphony OCR Will Process by Default

  • Symphony OCR will process image-only PDF files (and TIFF files if you choose to do so) and convert them to image + text PDF files.
  • Symphony OCR will process image-only pages within a hybrid PDF and convert them to an image + text PDF (but will not process the rendered pages as they are already text searchable).

What Symphony OCR Will Not Process

  • Symphony OCR will not process rendered PDFs, as these documents are already text searchable. Instead, it will place these files into the "Contains Text" or "Already OCRed" lists.
  • To ensure integrity of the original PDF content, Symphony OCR will not process a PDF if the PDF has been encrypted.
  • Symphony OCR will not recognize handwriting.
