HomeGuides :: Symphony OCRFAQsHow to Process No Image or Text Documents

18.11. How to Process No Image or Text Documents

For the most part documents that do not contain an image or text don't need to be processed.  One example of this might be your company's PDF letterhead template.  By default these PDF documents will not be processed and will be placed in the "No Image or Text" Document List.  How can a PDF have no image and no text?

Content can be drawn in a PDF using one of three mechanisms:
  1. A bitmap image can be placed on the page. 
  2. Text can be rendered on the page using fonts. 
  3. Simple drawing operations (i.e. line or curve segments).
Scanned documents are always bitmaps drawn on the page - these documents are not text searchable without OCR.  If a document is created electronically (e.g. print to PDF from Word), the document will typically consist of text rendered using fonts - these documents are generally text searchable without OCR.  However, there are some cases where the font can't be embedded in the PDF (for font licensing or other reasons), the text content will be rendered using simple drawing operations.  When this happens, the documents are not text searchable without OCR.  

Certain tools like Autocad always use drawing operations to render all content (line segments in an architectural drawing, for example), including text.

How does Symphony OCR handle no-image/no-text documents?  It is not possible for Symphony to reliably differentiate between drawn line segments that are part of words and drawn line segments that are just lines that are part of an architectural drawing, table lines, etc...  As such, we mark these documents for special handling 'No Image/No Text', and allow the user (that's you!) to force OCR of a given document if desired.

If you would like Symphony OCR to process a specific document (or set of documents), you may force processing.  Here's how:

To process only a single document in the list:

  • Select the "No Image or Text" item from the navigation panel on the left
  • Click on the appropriate document in the document path to open the Details view
  • Click "Enable Processing"

To process all No Image or Text files on a "per document" level:

  • Select the "No Image or Text" item from the navigation panel on the left
  • Optionally use the Filter box to get a sub-set of the documents
  • Select "Show Bulk Operations"
  • Choose "Enable Processing"

 To process all No Image or Text files permanently going forward:

  • Close Symphony OCR by using the Quit button in the bottom left corner of the interface (or, if Symphony OCR is installed as a service, stop the service)
  • Go to C:\Program Files (x86)\Trumpet\SymphonyOCR\config (may vary if you installed in a different location)
  • Right-click on the settings.xml file and select Open With > Notepad
  • Locate this line: <heuristicComputerProvider alwaysAnalyzeAndProcessNoImageNoTextPages="false"/>
  • Change the "false" to "true"
  • Save and close the file
  • Re-launch Symphony OCR (start the service if Symphony OCR is installed as a service)

You can confirm this setting has been applied by viewing the Analyzer and Processor pages:


This page was: Helpful | Not Helpful

© 2012 Trumpet, Inc., All Rights Reserved