HomeGuides :: Symphony OCRAdminstrator GuideSymphony OCR Workflow, Tools & Document Lists

11.7. Symphony OCR Workflow, Tools & Document Lists

Common Workflow Diagram

Symphony OCR searches the document repository for documents to process. It then organizes those documents into one of several lists (these lists are available on the left side of the Symphony OCR web interface). The following diagram displays the tools, lists and explains how they interact:

Symphony OCR Tools

Symphony OCR consists of three main tools that interact to provide full OCR services:

Finder -locates documents in your document repository

Analyzer -determines if a given document is a candidate for OCR

Processor - performs the actual OCR

As documents flow through Symphony OCR, each of the above components works on the document, then places it in a particular Document List, as described in the next section.

Symphony OCR Document Lists

Backlog Lists

The backlog consists of documents that have not been analyzed or OCRed. These are documents that Symphony OCR is still working on.  The following document lists represent the backlog:

Analyzing – Documents waiting for the Analyzer to determine if they are candidates for OCR or not

In Process – Documents that are in the process of being Analyzed

Processing – Documents are candidates for OCR, but have not been processed yet

Reprocessing – Documents had some recoverable problem during OCR, and will be processed again later.  Typical causes are if the document is open by a user, or was modified while OCR was taking place

Processed Lists

These lists represent documents that were successfully analyzed or OCRed.  They are documents that have either been OCRed or were already text searchable (and thus not in need of OCR):

Processed – documents that have been successfully OCRed by Symphony OCR

In Process – documents that are currently being OCRed

Already OCRed – documents that were already OCRed (by some other processor or by an earlier version of Symphony OCR)

Contains text
– documents that are already text searchable (no OCR needed)

No image or text
– some rare documents contain no text, but also contain no images – these generally do not need to be OCRed however, you may choose to do these on a one-off basis.  See "How to Process No Image or Text Documents" for instructions

Email Messages - contains the list of email messages that contained attachments that were processed by Symphony OCR

Not Processed Lists

These lists represent documents that could not be processed for some reason. In most cases, an administrator will want to glance over these lists from time to time to ensure that there are no issues with the documents that didn't get processed:

Needs Attention:  Documents in the Needs attention list are those that appear to be eligible for OCR, but encountered problems during processing. Files in this list could be corrupted or contain invalid images (try opening them in an image viewer to be sure), or they may be images that Symphony OCR does not handle yet. If the image appears in your viewer, contact your Symphony reseller to see if handling for the file can be added.  If the document is corrupted, you can either remove the document from Worldox, or manually tell Symphony OCR to 'ignore' it, which will put it on the Ignored list. If the Needs Attention list contains any documents, the overall system condition will show as "Warn." Ignoring a document that you have already checked is a good way to change the system condition back to "OK".

New:  Documents in the New list are those that have be found by the finder tool, but not yet allocated to another document list (documents are only in the New state for a very short period of time).

Deleted:  Documents in the deleted list mean that the document record is in the process of being purged from the database – documents should only be in this state for a very short period of time.

Too Old:  Documents in the Too Old list are those that have a file modified date older than the cut off age defined the Processor configuration.

Inaccessible:  Documents in the Inaccessible list are those that could not be processed because of file system security, Worldox security, read-only attributes or other conditions that prevent the document from being accessed and worked on.  In addition, if the profile group in which the documents reside contains an invalid base path (containing a space for example), or if the file has a space immediately prior to the document extension, they will be shown in the inaccessible list

Corrupted Documents

Documents in the corrupted list are those that Symphony OCR does not recognize as valid files. The most common reason is that the file is an invalid or corrupted PDF (try opening in Adobe to be sure).  Another possibility is that there is some characteristic of the PDF that the Symphony OCR parsing algorithm isn't handling properly.  Trumpet does periodically update the PDF parsing algorithms to address corner cases that have not been encountered before.

What to do?

Try opening the file in Acrobat, then hit Save (Acrobat will try to open and auto-repair corrupted files - when you save the document, it will save uncorrupted).  After saving and closing the document, click the Re-Analyze button on the document record in Symphony OCR.  This will only work if the file is only lightly corrupted, but is worth a shot.

If that doesn't help, next check to see if the file is already text searchable (i.e. can you search for text inside the PDF already?).  If you can, then the document isn't a candidate for OCR anyway, and you can just move the document to the Ignore list.

If the document does need to be OCRed, and the Adobe repair doesn't help, then you may want to submit the document to us for analysis.  Open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload the document to us.  If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch.

If there are a large number of files that have the same corruption reason, and the files don't appear to actually be corrupted, please open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload a sample document to us.  If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch.  Alternatively, you can use a bulk Ignore operation to move the documents to Ignore.

Encrypted / Restricted:  Documents in the Encrypted/Restricted list are those that are restricted from being processed because of some characteristic of the file itself (for example, an encrypted or partially restricted PDF file will not be processed).

Ignored:  Documents in the Ignored list are documents that a Symphony OCR administrator has explicitly told Symphony OCR not to process. Any document on this list was explicitly placed there by human intervention.

Wrong Type:  Documents in the Wrong Type lists are a tif documents and TIFF processing is not enabled.

Moved / Unavailable:  Documents in the Moved / Unavailable list are no longer available in the Document Management System (DMS).  This could mean that the DMS has gone "offline" or the DMS settings have been adjusted so that the documents would not have been found for processing (e.g., if a user selects a profile group to analyze and OCR, and then chooses to un-check that profile group or no longer process it).  Document records in the Moved/Unavailable list will be deleted from the database after 15 days.  Documents can also appear in the Moved / Unavailable list if they are no longer at that current location.

Digitally Signed:  Documents that are digitally signed will not be processed by Symphony OCR because adding OCR information to these documents would invalidate the digital signature.  If you wish to have these documents OCRed anyway (and are OK with invalidating the digital signature), please send an email to support@trumpetinc.com and request that functionality be added.

Too Big: Documents in this list contain one or more pages with pixel dimensions larger than a specified value. The value(s) are declared in the setting.xml file but defaults differ depending on the version you're running.

For versions NEWER than 6.5.32

Default: If an individual page contains a total pixel count higher than 36,000,000 pixels the entire document will be filed under the "Too Big" list.

Advanced Configuration Setting

If you wish to attempt to process documents that have a total pixel count larger than 36,000,000 pixels, you may opt to do so by updating the settings.xml file.  Here's how:

  • Close Symphony OCR (stop Service if installed as Service).
  • Navigate to C:\Program Files\Trumpet\SymphonyOCR\Config\ and open the settings.xml file using notepad.
  • The setting you want to adjust is highlighted in yellow below:
          <documentPreProcessor ..... maxPixels="36000000" ..... />
  • Update the maxPixels variable (within the " ") to whatever you feel is appropriate.
    • Tip: Reference the details on your document, that SOCR reports, to reference the actual size of the page. Set the to equal or exceed that.
  • Save the settings.xml file
  • Launch Symphony OCR (Start Service if installed as Service)

Note:  If Symphony OCR is not able to process these documents they may end up in the Needs Attention list.


General reference guide for page sizes in inches to total pixels:

A Size (8.5x11 inches) = 8415000 pixels

Legal (8.5x14) = 10710000

B size (two A sizes — 17x11) = 16830000

C size (two B sizes — 17x22) = 33660000

Default (20x20) = 36000000

D size (two C sizes — 22×34) = 67320000


For versions OLDER than 6.5.32

Default: If an individual page is larger than 10,000 x 12,000 pixels the entire document will be filed under the "Too Big" list.

Advanced Configuration Setting

If you wish to attempt to process documents that have an individual page larger than 10,000 x 12,000 pixels, you may opt to do so by updating the settings.xlm file.  Here's how:

  • Close Symphony OCR (stop Service if installed as Service)
  • Navigate to C:\Program Files\Trumpet\SymphonyOCR\Config\ and open the settings.xml file using notepad
  • The setting you want to adjust is highlighted in yellow below:
          <documentPreProcessor ..... maxHeightPixels="10000" maxWidthPixels="12000" ..... />
  • Update the maxHeight and maxWidth variables (within the "") to whatever you feel is appropriate.
    • Tip: Reference the details that SOCR reports to reference the actual size of the page. Set the max just above that.
  • Save the settings.xml file
  • Launch Symphony OCR (Start Service if installed as Service)

Note:  If Symphony OCR is not able to process these documents they may end up in the Needs Attention list.

Note: If you update your version to 6.5.32 or above then tell SOCR to re-analyze the documents in the 'Too Big' list. Once it re-analyzes them it will now reference their Total Pixel Count, instead of the Height Width ratio.


See Manipulating Document Lists for more information on how to manage these lists

This page was: Helpful | Not Helpful

© 2012 Trumpet, Inc., All Rights Reserved