Home → Guides :: Symphony OCR → Printer Friendly Version
Symphony OCR is part of Symphony Suite, The Complete Imaging Solution. Symphony OCR is a back-end OCR engine. It will locate all image-only PDF and TIF files in your document management system and convert them to fully text searchable PDFs by adding an invisible layer of text over the image. Symphony OCR typically runs on a back-end PC or server (for Worldox sites, this is typically the Indexer PC).
Tip: Turn OCR off at your scanners and see significant (2 to 5x) improvement in scanning speeds. In fact, Adobe Acrobat turns OCR on by default, and we strongly recommend turning it off. Let Symphony OCR take care of the OCR in the background.
Document Repository - Stores the documents that Symphony OCR processes.
Symphony OCR - Monitors the Document Repository for image-only PDF and TIFF files, processes those files by adding an invisible layer of OCRed text, then saves them back to the Worldox Document Repository.
Web Browser - Displays the Symphony OCR user interface. This can be viewed from any browser in your local area network, though most commonly it is displayed in a browser on the monitored PC.
In order to assist you with installing and configuring Symphony Profiler we recommend that you complete a short site survey for the firm to ensure that you know which Symphony OCR feature you wish to enable, etc.
Symphony OCR Site Survey for Worldox
Symphony OCR Site Survey for NetDocuments
For a video showing these steps see: Symphony OCR - How to Install
Note to Channel Partners: Installers can be downloaded from the Channel Partner Resource Center - Implementation Resources
When you launch Symphony OCR for the very first time, you will be prompted to walk through a few quick steps. Follow the steps below to get SOCR up and running.
Note: You do not have to follow the steps provided but it does cover just about everything you need. If you leave the wizard then you may continue configuring manually using the links on the left side panel. See the Configuration Guide for details.
If you are unable to open Symphony OCR, try copying the URL to a different browser, or see this article: Unable to Open Symphony Interface.
Paste in your Symphony OCR license and click "Save and Continue".
(See also Licensing)
Input an email address and select a notification type (see Notifications for more info)
If you'd like to process TIFF files and/or email attachments then check the appropriate boxes to enable processing (see Basic Settings section in Processor for more info)
Your license dictates what document management system your software can integrate with. You'll need to configure them in order for Symphony OCR to find your documents and begin working. The buttons to "Configure" will take you to the SOCR settings to set it up, while the "Quick Start Guide" buttons will take you to articles with instructions. You can also refer to the links below.
Each document management system that Symphony OCR integrates with has different instructions for configuring. Click on the appropriate chapter links below to configure SOCR for your document management system:
Configuration Guide - NetDocuments
Configuration Guide - ShareFile
Configuration Guide - Open Text
Configuration Guide - Practice Master
Configuration Guide - Microsoft One Drive
Configuration Guide - Google Drive
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Worldox
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Worldox document repository for newly saved and modified files every 15 minutes. Generally speaking, newly saved files will be OCRed within about 15 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Worldox - Finder for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Worldox - Processor for further information on configuration settings that determine which files are processed.
Note: While Symphony OCR will likely process documents within about 15 minutes, they will not be immediately available from within Worldox via 'Text in file' searches. Within 15 minutes, you will be able open the file and search for text that way (ie. Ctrl+F). But for the documents to be returned in the 'Text in file' searches, the Worldox text index needs to be updated (which normally happens every night). So despite your files being OCRed, you will need to wait until the next day (typically) in order to do full text in file searching via Worldox.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Basic settings
Worldox User Code - This is where the Worldox user is specified. This is the user that Symphony OCR should search for documents as (note that Symphony OCR does not actually use a Worldox license). Symphony OCR will have access to all profile groups that the specified Worldox user has access to. This user should have Worldox Manager Rights. We recommend using the 000000 user.
Worldox Network Folder - This is the network folder in which Worldox is installed. It can be identified as a UNC path or a mapped network drive (e.g. \\server1\DMS\Worldox, or X:\Worldox) unless you are running Symphony OCR as a service, in which case it must be identified as a UNC path.
Profile Groups to Monitor
This is the list of profile groups the user specified has access to. If a profile group does not appear in the list, this user does not have access to those profile groups (or the profile group has not been properly configured in Worldox). You can select the checkbox in the header area to automatically select and process all documents in all profile groups. If you wish to only process certain profile groups, you can simply select the applicable ones. Be sure to select "Save Changes" at the bottom of the screen.
Default Priority - There are 6 processing priorities which range from Very Low to Very High and includes "Analyzer Only". By default all profile groups will be processed with a "Normal" priority. If you wish to change the priority for a particular profile group, select the appropriate item from the drop down arrow. If you wish to re-prioritize documents that have already been found in that particular profile group as well as new documents that are in that profile group, select the "Reprioritize existing documents" checkbox. For more information on Processing Priorities see: Processing Priorities
Refresh - Allows you to refresh the list of available profile groups. For example, if you have added a new profile group to Worldox, and wish to process that group, you can select this which will provide you with the newly added profile groups.
View Detailed Progress - Selecting this will take you to the Progress Details page. This will provide you with a list of profile groups, the number of documents and pages that have been processed / not processed per profile group.
Advanced settings
Process Read Only Files - If you wish to process read-only files, you should check this checkbox.
Indexed Search Frequency - By default Symphony OCR will search for documents in selected profile groups once every 15 minutes using Indexed Searches. This should be sufficient for your needs, however you can change this to search more or less frequently.
Non-indexed Search Frequency - By default Symphony OCR will search for documents in selected profile groups once every 12 hours without using Worldox indexes. Because it takes a significant amount of time to crawl through the directory structure to find files, once every 12 hours should be sufficient.
Debugging
Reset Worldox Session - Selecting "Reset Worldox Session" will reset the Worldox session for the user defined in the Basic Settings above.
To OCR files that are marked as Read-Only:
Status
Worldox Indexed Search performs an indexed search to find documents that have been created or modified *today* that are eligible for OCR. By default, it performs the query every 15 minutes. This can be adjusted by selecting "Manage". This will take you to the Worldox page where you can adjust the search frequency under Advanced Settings.
Worldox Non-Indexed Finder performs a non-indexed search to find all documents in Worldox that are eligible for OCR, regardless of how recently the document has been created or modified. By default, it performs this search once every 12 hours. This can be adjusted by selecting "Manage". This will take you to the Worldox page where you can adjust the search frequency under Advanced Settings.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Special Note: See Indexing Email Attachments to enable the Worldox Indexer to process text.
Symphony OCR honors the Worldox security model, so it will only be able to process documents that are fully accessible by the Worldox user that Symphony runs as.
If users want documents that have been restricted in Worldox to be OCRed, they must configure Worldox's security features to allow Symphony to modify the document.
If you do not do this configuration, documents will be placed on the Inaccessible list and will not be OCRed.
In the ethical wall configuration, be sure that you have added the Symphony Worldox user (usually 000000) to the users list for the ethical wall, and that you have configured that user to have full access to documents covered by the ethical wall.
When users classify individual documents, they need to make sure that they include the Symphony Worldox user (usually 000000) in the classification and give full access to that user. To make this easier, we suggest creating the following two security classifications (this is done from WDAdmin, Security->Classifications):
First, create the <Private - Symphony documents> classification:
Now create the <Read Only - Symphony documents> classification:
If you already have documents in the Inaccessible list, after making classification changes you will need to click Re-Analyze All to make those documents available for processing.
First, you will want to determine if these files have the potential to be migrated to the Worldox document repository.
If the firm does not want to migrate these documents to the Worldox document repository, but would like to OCR them, you can enable Folder Processing (assuming that the firm has the appropriate licensing). See: Configuration Guide - Folder.
If the firm will want to migrate these documents (or a subset of these documents to the Worldox document repository), Trumpet recommends that you create a "Legacy Cabinet" that points at the legacy area. This ensures that the document record for these files is nicely maintained. Here are some tips and tricks for setting up the Legacy Profile:
Once the legacy cabinet has been created, it will be listed in the Worldox Configuration page of Symphony OCR so that you can enable processing.
Tip: Trumpet offers an 8.3 Filename Tool to assist you in ensuring the 8.3 filenames are in tact, contact support@trumpetinc.com for more information on the tool.
For a video showing the configuration, visit: Symphony OCR - How to configure for NetDocuments
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - NetDocuments
*When Symphony OCR processes documents, the 'modified date' of the document will be updated to the date that the OCR occurs, and the 'modified by' will change to the user that Symphony OCR.
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the document repository for newly saved and modified files every 15 minutes. Generally speaking, newly saved files will be OCRed within about 15 minutes. Symphony OCR can also optionally process the files already stored in NetDocuments. By default, it performs a query for these files every 7 days. Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the legacy documents.
Note that Text-in-File searches within Netdocuments may not return the text in your recently OCR'd files for up to 6-8 hours. This is due to Netdocuments API and the behavior they've designed to run API operations at a lower priority. You will, however, be able to open the file and benefit from the OCR immediately after Symphony processes it.
Refer to the section, Configuration Guide - NetDocuments - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - NetDocuments - Processor, for further information on settings that determine which files are processed
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Connect to NetDocuments
Enable NetDocuments Integration
Available Repositories
This is the list of repositories the specified user has access to. If you wish to only process certain repositories, you can simply select the applicable ones. Select "Activate" to activate the repository for processing and confirm by select "Yes - Activate this repository". If you do not wish to activate the repository, choose "Cancel".
Important Note: Adding a repository to Symphony OCR is a one-way action; once added, a repository can not be removed. Be sure you only activate processing of repositories that you want to permanently tie to your license. Activating a repository will increase your NetDocuments user count.
Basic Settings
Connected to NetDocuments as — Displays the user as which Symphony OCR connects to NetDocuments.
Active Repository — Displays the active repository name(s).
Preserve modified user and date — Should be selected by default. This will ensure that when Symphony OCR processes a document it does not change the modified date or the user of the document.
Process legacy documents — When checked, Symphony OCR will process eligible documents that are already stored in NetDocuments. If the "Preserve modified user and date" checkbox is not set, this will update the modified date to the date that the OCR occurs. Note also that the 'modified by' user will be changed to the user that Symphony OCR is connected to NetDocuments under. If the "Preserved modified user and date" checkbox is checked, the dates and users will not be modified.
Create versions of OCRed results — When selected, Symphony OCR will save the OCRed document as a new version. If the "Preserve modified user and date" checkbox is not set, this will allow you to see the modified date of the document if you choose to process legacy documents, however, this will significantly increase your storage (see Modified Dates of Documents in NetDocuments for further details). If the "Preserve modified user and date" checkbox is set, you may still opt to check this check box, again understanding that this will significantly increase your storage).
Cabinets to Monitor
This is the list of cabinets the Symphony OCR user has access to. If a cabinet does not appear in the list, this user does not have access to those cabinets. You can select the checkbox in the header area to automatically select and process all documents in all cabinets. If you wish to only process certain cabinets, you can simply select the applicable ones. Be sure to select "Save Changes" at the bottom of the screen. If you wish to process certain cabinets at a higher priority than others, you can do so by selecting the appropriate drop down in the list. For more information see: Processing Priorities
View detailed progress — Selecting this link will take you to the Progress Details page. This will provide you with a list of cabinets and the number of documents and pages that have been processed / not processed per cabinet.
Advanced Settings
Create versions of OCRed results — When checked, Symphony OCR will saved the OCRed document as a second version of the original. This will significantly increase the amount of storage as this will in essence duplicate documents. We strongly recommend against enabling this feature, as Symphony OCR has several mechanisms available for recovering pre-OCRed versions of documents.
New documents search frequency — By default, Symphony OCR will perform a search for new documents every 15 minutes. The value on the right may be adjusted if you require searching for documents less frequently.
Legacy documents search frequency — By default, Symphony OCR will perform a search for legacy documents (documents existing prior to installing Symphony OCR) every 7 days.
Status
NetDocuments Recent Documents Search - performs a search to find documents that have been created or modified *today* that are eligible for OCR. By default it performs the query every 15 minutes. This can be adjusted by selecting "Manage". This will take you to the NetDocuments page where you can adjust the search frequency under Advanced Settings.
NetDocuments Legacy Documents Search - performs a search to find legacy documents that are eligible for OCR. By default it performs the query every 7 days. This can be adjusted by selecting "Manage". This will take you to the NetDocuments page where you can adjust the search frequency under Advanced Settings.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
On a new installation, by default Symphony OCR will preserve the modified date of the document (assuming the "Preserve modified user and date" checkbox has been selected in the NetDocuments Integration Settings page).
On a new installation, by default Symphony OCR will preserve the modified user of the document (assuming the "Preserve modified user and date" checkbox has been selected in the NetDocuments Integration Settings page).
There are four options regarding creating new versions of documents in Symphony OCR:
Do not create versions: Symphony OCR will not create a new version of the OCRed results with the exception of .msg files. NetDocuments requires that Version 1 of emails not be overwritten.
Create versions for all documents: Symphony OCR will create a new version of each document as it processes the documents.
Create Versions for PDF files only: Symphony OCR will create a new version of PDF files (not .tiff files)
Create versions
for non-PDF documents only: Symphony OCR will create versions for .msg files (default behavior) and .tiff files as well.
Note: Symphony OCR converts .tiff files to .pdf before processing those. If you opt to process .tiff files they cannot be converted back to .tiff after processing.
Note: NetDocuments disallows
modifications to version 1 of any MSG document. Therefore, the first
change Symphony makes to an MSG will produce a second version,
regardless of which of the above options you select.
(Update already, why don't 'cha :) )
Symphony OCR changed the modified dates of documents.
Symphony OCR exactly preserved all of the original content in a document. When Symphony OCR adds the invisible layer of text, the underlying file did get modified because non-visual information was added. This means that the modified date of those documents would have changed to the date that Symphony OCR had processed them. When processing documents in "real time" this only changed the modified date slightly (by a few minutes / hours). When processing your legacy store of documents (Symphony OCR treats documents modified more than 7 days ago as 'legacy'), however, it may have significantly changed the modified date.
For example, you may have had an image-only PDF document stored in NetDocuments with the modified date of 6/17/2013. When Symphony OCR was installed and processesed that document, the modified date was changed to the date the document was processed. So if it was OCRed on 9/18/2014, the modified date would have shown as 9/18/2014.
If tracking the original modified date of your legacy store was important, you could have opted to enable versioning of your documents. When Symphony OCR processesed your documents, it created a new version of the document, ensuring the original modified date stayed intact. If you opted to have Symphony OCR save the processed document as a new version, the amount of storage you were using will have increase (because there are multiple versions of the same document in your document repository). To enable versioning of your documents, see: NetDocuments - Basic Settings.
Because Symphony OCR actually "changed" the PDF documents (and TIFF documents if you choose to process them) - it added an invisible layer of text to the document - the modified user of those documents will change to the user that Symphony OCR is running as.
Upon updating to Version 6.6.22 you will be prompted in the Summary Page that NetDocuments Integration has warnings. Select "Manage" to see the warnings.
At the top of the screen, you will see the "Issue" presented is as follows:
NetDocuments is currently configured to NOT preserve modified user and date for OCRed documents. We strongly recommend that you enable the 'Preserved modified user and date' option. Tip: You may also want to consider disabling 'Create versions of OCRed results' when you make this change.
To update this, select the 'Preserve modified user and date' checkbox and optionally 'Create versions of OCRed' results (if you wish to do so)
If you prefer NOT to have the Modified User and Dates preserved, you can select the "IGNORE" link which will remove the warning from the NetDocuments Integration Settings and the Summary pages.
SymphonyOCR will process documents that were stored in the NetDocuments repository prior to SymphonyOCR having been installed if you opt to enable the functionality in the NetDocuments Configuration screen. SymphonyOCR recognizes documents modified more than 7 days ago as 'legacy' documents. When you enable 'Process legacy documents', Symphony will periodically search through all documents for files that are eligible for OCR. The frequency of this search is controlled in the 'Advanced settings'. The default behavior is to perform the search every 7 days.
To reset the NetDocuments user
Symphony OCR is installed as an on-premise Windows service.
Symphony OCR consists of a back-end service that monitors NetDocuments for new and changed documents, analyzes documents, and OCRs documents. The Symphony OCR service also presents a web based interface for administration. This web interface is only exposed on the firm’s internal network.
Symphony OCR interacts with NetDocuments via the standard NetDocuments REST API (full details of the NetDocuments API integration can be found here: https://support.netdocuments.com/hc/en-us/articles/205219850-API-Documentation ).
Symphony OCR uses the Internet standard OAuth2 authentication protocol to request permission from the user. Once the user has approved integration, NetDocuments provides SymphonyOCR with an access token that is used for operations that interact with NetDocuments (querying for document meta data, downloading document content, uploading document content). A NetDocuments administrative user must give explicit permission for this access to be configured, and the administrative user may revoke the access via the NetDocuments administrative interface at any time.
All network communication between Symphony OCR and NetDocuments is encrypted using standard HTTPS protocols.
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - ShareFile.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Connect to Sharefile
Basic Settings
ShareFile Account - Displays the user that Symphony OCR connects to ShareFile as
Folders to Monitor
This is the list of Folders the user specified has access to. If a folder does not appear in the list, this user does not have access to those folders.
View detailed progress - Selecting this link will take you to the Progress Details page. This will provide you with a list of Cabinets, the number of documents and pages that have been processed / not processed per cabinet.
Advanced Settings
Search frequency - By default, Symphony OCR will perform a search for new documents every 60 minutes. The value on the may be adjusted if you require searching for documents less frequently.
Status
ShareFile Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 60 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Enter the database credentials:
Database login credentials
Login to database with username: Enter the username for the database
Login to database with password: Enter the database password
Database computer name: Enter the database computer name
Database server instance name: this is optional and required only if there is more than one database on the server, if there is more than one database on the server, enter the instance name you wish to process
Database name: enter the name of the database
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
[[INSERT:1288]]
Database login credentials
Login to database with username: Enter the username for the database
Login to database with password: Enter the database password
Database computer name: Enter the database computer name
Database server instance name: this is optional and required only if there is more than one database on the server, if there is more than one database on the server, enter the instance name you wish to process
Database name: enter the name of the database
Advanced settings:
New document search frequency: By default Symphony OCR will search for documents once every 15 minutes. This should be sufficient for your needs, however you can change this to search more or less frequently
Legacy document search frequency: By default, Symphony OCR will perform a search for legacy documents (documents existing prior to installing Symphony OCR) every 7 days.
OpenText New Document Search - performs a search for new documents that are eligible for OCR. By default it performs this search once every 15 minutes. This can be adjusted by selecting "Manage". This will take you to the OpenText page where you can adjust the search frequency.
OpenText Legacy Document Search - performs a search for legacy documents that are eligible for OCR. By default it performs this search once every 7 days. this can be adjusted by selecting "Manage". This will take you to the OpenText page where you can adjust the search frequency.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Copy and paste the root of the path where the documents reside within Practice Master
Select "Save Changes"
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - PracticeMaster
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Basic settings
PracticeMaster network folder/current working directory this is where the Practice Master network folder is identified. Copy and paste the path to the network folder into the field.
Documents folder this is the root of where the documents reside within Practice Master. Copy and paste the path to the folder into the field.
Advanced settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Finder Scan Frequency - by default Symphony OCR will search for documents once every 120 minutes. This should be sufficient for your needs, however you can change this to search more or less frequently
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - LSSe64
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Database login credentials
Login to database with username - enter the LSSe64 database username in this field
Login to database with password - enter the LSSe64 database password in this field
Database computer name - enter the name of the computer / workstation
Database server instance name - (Optional) If an instance name is defined, enter the name of the SQL server instance (on the Database computer name workstation) that is running LSSe64. If no instance name is defined, leave this field blank.
Database name - enter the name of the SQL Database for LSSe64
Advanced Settings
New documents search frequency - by default Symphony OCR will query the LSSe64 database for newly saved documents every 15 minutes. This is typically sufficient, but you may adjust that accordingly.
Legacy documents search frequency - by default Symphony OCR will query the LSSe64 for Legacy documents (documents saved to the database prior to installing Symphony OCR) every 7 days. This is typically sufficient for handling the back log but you may adjust that according to the firm's specific needs.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Folders
*Symphony OCR will process the entire directory tree of the path you provide (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements)
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Folder - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Folder - Processor, for further information on configuration settings that determine which files are processed.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
In order to enable a 64-bit version of Windows to search for text in PDF files, there are a few additional steps to take. You will need to download and install Adobe's PDF iFilter for 64-bit version of Windows.
To get started, visit: http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542 to download and install the Adobe iFilter.
Once you have installed the Adobe iFilter, open your Control Panel and go to "Indexing Options".
Click "Advanced" at the bottom and then select the "File Types" tab.
Scroll down to "pdf" under "Extensions" and it should now say "PDF Filter".
Symphony OCR's Folder integration can be used to point at your Time Matters repository directory. Simply input the path to the repository into the Folders configuration using the instructions found earlier in this chapter. Remember, if Symphony is installed as a service be sure to input this as a UNC path.
To get the absolute most out of Symphony OCR, make sure your Time Matters' text indexing functionality is enabled so that you can do 'text-in-file' searches. Refer to your Time Matters rep/support for assistance with that.
Note: Symphony OCR works with the locally synced (either desktop or server) folder tree of Box and uses a Windows Folder Tree License. For more information on Box's sync tool visit: Box Sync Installation Information
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Box
*Symphony OCR will process the entire directory tree of the path you provide (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements)
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Box - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Box - Processor, for further information on configuration settings that determine which files are processed.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Note: Symphony OCR works with the locally synced (either desktop or server) folder tree of Dropbox and uses a Windows Folder Tree License.
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Dropbox
*Symphony OCR will process the entire directory tree of the path you provide (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements)
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Dropbox - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Dropbox - Processor, for further information on configuration settings that determine which files are processed.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Note: Symphony OCR works with the locally synced (either desktop or server) folder tree of Google Drive and uses a Windows Folder Tree License.
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Google Drive
*Symphony OCR will process the entire directory tree of the path you provide (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements)
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Google Drive - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Google Drive - Processor, for further information on configuration settings that determine which files are processed.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Note: Symphony OCR works with the locally synced (either desktop or server) folder tree of Microsoft One Drive and uses a Windows Folder Tree License.
For more detailed information and advanced settings for configuring Symphony OCR, visit Configuration Guide - Microsoft One Drive
*Symphony OCR will process the entire directory tree of the path you provide (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements)
Symphony OCR should be off and running now! The Finder is looking through your repository and sending documents to the Analyzer to determine what needs to be processed. The Analyzer sends any documents eligible for OCR to the Processor which applies an invisible layer of text to the document.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Microsoft One Drive - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Microsoft One Drive - Processor, for further information on configuration settings that determine which files are processed.
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Connect to SharePoint
For a quick video showing the installation and configuration of SharePoint visit: https://youtu.be/UNGbJiaRn9A
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Connect to SharePoint
For a quick video showing the installation and configuration of SharePoint visit: https://youtu.be/UNGbJiaRn9A
Background:
When you need to update the SharePoint credentials that SymphonyOCR connects with.
Possibly from updating the SharePoint users password and Symphony enters an error state that it cannot connect to SharePoint any longer. Or the user that SymphonyOCR connected to SharePoint with has been deactivated and Symphony enters an error state that it cannot connect to SharePoint any longer
Solution:
For assistance with this process you can reach out to your SymphonyOCR Channel Partner or Trumpet at Support@trumpetinc.com.
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
SharePoint New Document Search in "X" Folders: ("X", indicates the number of folders Symphony OCR will process) performs a search in the folder structure to find all documents that are eligible for OCR. By default the finder does its search every hour. This can be adjusted by selecting "Manage". This will take you to the SharePoint page where you can adjust the search frequency for each folder.
SharePoint Legacy Document Search in "X": ("X", indicates the number of folders Symphony OCR will process) performs a search in the folder structure to find legacy documents that are eligible for OCR. By default the finder searches for legacy documents every 12 hours. This can be adjusted by selecting "Manage". This will take you to the SharePoint page where you can adjust the frequency of legacy document searches.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
To access it directly log onto the workstation where Symphony OCR is installed.
You can also access Symphony OCR from your workstation see: Accessing Symphony OCR
Since Symphony OCR uses a web interface, the display may not automatically refresh as it performs its work. You can manually refresh by selecting the "Refresh" button in Symphony OCR or in the web browser. The Symphony OCR summary page refreshes automatically every 60 seconds. All other pages require a manual refresh.
Symphony OCR can be accessed from the web browser of any workstation connected to the network by typing in the address found in the web browser in which Symphony OCR runs:
If you would prefer for the End Users to see only the Summary View of Symphony OCR, you can do so by doing the following:
In the main Symphony OCR page, select "Simple View"
This will open the Summary Screen without the additional navigation panel:
Copy and paste the URL from here to provide to those users.
The Symphony OCR Summary page can be considered the "Dashboard" for Symphony OCR to allow users to view/manage the system condition of Symphony OCR, including current and historical progress and many more items. Below we've highlighted the most common features for Symphony OCR's dashboard.
Deleted: Documents in the deleted list mean that the document record is in the process of being purged from the database (documents should only be in this state for a very short period of time).
There are two methods for looking up the details of a particular document:
Lookup By Path - Enter the full path of the document and click "Query" (See also: Checking the Status of a Document).
Look up a document - NetDocuments
Look up Document- Worldox
Look up a document - Windows Folder Tree
Once on the details page, the user can perform these functions:
Refresh - Provides the most current details of a document.There are also various bits of data or history showing what's been found on the file, and what's been done to it. For example, the "History" section shows all of the events logged for that file.
Note: If you delete the details of this document, it will delete this history and start from scratch. The "Page Analysis Details (before processing)" indicates how many words per page were found within the file BEFORE Symphony OCRed it. Visible words are computer-readable words (like digital headers or footers, or text generated by Word, etc). Hidden (aka invisible) words would be words applied by something like Symphony OCR. Note that these numbers are PRE-processing and they do not update after Symphony OCRs the file.
Symphony OCR is a back-end processing engine. This means that very little user interaction is required, but as an administrator, you may wish to check on the status of the software.
For additional suggestions on monitoring Symphony OCR, visit Ongoing Care & Feeding.
There are four different types of files pertinent to this discussion:
By default, Symphony OCR queries the Worldox document repository for newly saved and modified files every 15 minutes. Generally speaking, newly saved files will be OCRed within about 15 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Worldox - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Worldox - Processor, for further information on configuration settings that determine which files are processed.
Note: While Symphony OCR may process documents within about 15 minutes, you will need to wait until the text indexes are updated (typically overnight) in order to do full text in file searching for the documents using the Worldox document management system.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 120 minutes. Generally speaking, newly saved files will be OCRed within about 120 minutes. Depending on the volume of image-only documents already filed to Worldox, it may take a while for Symphony OCR to process the backlog (legacy files). Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the backlog.
Refer to the section, Configuration Guide - Folder - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - Folder - Processor, for further information on configuration settings that determine which files are processed.
By default, Symphony OCR queries the Folder document repository for newly saved and modified files every 15 minutes. Generally speaking, newly saved files will be OCRed within about 15 minutes. Symphony OCR can also optionally process the files already stored in NetDocuments. By default, it performs a query for these files every 7 days. Symphony OCR gives precedence to newer files, so documents that are scanned today will be processed before the legacy documents.
Note: While Symphony OCR may process documents within about 15 minutes, Netdocuments may take up to 6-8 hours to update its text index. Meaning, if you'd like to run a Netoducments search for words within a document that was recently OCR'd then there may be a 6-8 hour delay. This is due to how Netdocuments prioritizes API activity.
Refer to the section, Configuration Guide - NetDocuments - Finder, for further information on finder settings that determine when Symphony OCR locates files for processing.
Refer to the section, Configuration Guide - NetDocuments - Processor, for further information on settings that determine which files are processed.
Symphony OCR searches the document repository for documents to process. It then organizes those documents into one of several lists (these lists are available on the left side of the Symphony OCR web interface). The following diagram displays the tools, lists and explains how they interact:
Symphony OCR consists of three main tools that interact to provide full OCR services:
Finder - locates documents in your document repository
Analyzer - determines if a given document is a candidate for OCR
Processor - performs the actual OCR
As documents flow through Symphony OCR, each of the above components works on the document, then places it in a particular Document List, as described in the next section.
The backlog consists of documents that have not been analyzed or OCRed. These are documents that Symphony OCR is still working on. The following document lists represent the backlog:
Analyzing - Documents waiting for the Analyzer to determine if they are candidates for OCR or not
In Process - Documents that are in the process of being Analyzed
Processing - Documents are candidates for OCR, but have not been processed yet
Reprocessing - Documents had some recoverable problem during OCR, and will be processed again later. Typical causes are if the document is open by a user, or was modified while OCR was taking place
These lists represent documents that were successfully analyzed or OCRed. They are documents that have either been OCRed or were already text searchable (and thus not in need of OCR):
Processed - documents that have been successfully OCRed by Symphony OCR
In Process - documents that are currently being OCRed
Already OCRed - documents that were already OCRed (by some other processor or by an earlier version of Symphony OCR)
Contains text - documents that are already text searchable (no OCR needed)
No image or text - some rare documents contain no text, but also contain no images. These generally do not need to be OCRed however, you may choose to do these on a one-off basis. See "How to Process No Image or Text Documents" for instructions
Email Messages - contains the list of email messages that contained attachments that were processed by Symphony OCR
These lists represent documents that could not be processed for some reason. In most cases, an administrator will want to glance over these lists from time to time to ensure that there are no issues with the documents that didn't get processed:
Needs Attention: Documents in the 'Needs Attention' list are those that appear to be eligible for OCR, but encountered problems during processing. Files in this list could be corrupted or contain invalid images (try opening them in an image viewer to be sure), or they may be images that Symphony OCR does not handle yet.
Occasionally, a document can fall into the 'Needs Attention' list because of bad timing - Symphony OCR trying to process the document when it isn't fully available. So we always recommend clicking the "Show Bulk Operations" button and then "Re-Analyze All", just to ensure this isn't the case.
If the document is corrupted, you can either remove the document from Worldox, or manually tell Symphony OCR to "ignore" it, which will put it on the 'Ignored' list. If the 'Needs Attention' list contains any documents, the overall system condition will show as "Warn." Ignoring a document that you have already checked is a good way to change the system condition back to "OK".
If the document does not appear corrupted, the next step would be to allow us to see a copy of the file. Because PDFs can be generated in countless different ways, we occasionally run into a specific sub-type of PDF that we've not encountered before. If we can get a copy of the file that is falling into the 'Needs Attention' list, we can in almost all cases, add support for the file. Please contact us at support@trumpetinc.com for instructions to upload documents to our secure site.
New: Documents in the New list are those that have be found by the finder tool, but not yet allocated to another document list (documents are only in the New state for a very short period of time).
Deleted: Documents in the deleted list mean that the document record is in the process of being purged from the database (documents should only be in this state for a very short period of time).
Too Old: Documents in the Too Old list are those that have a file modified date older than the cut off age defined the Processor configuration.
Inaccessible: Documents in the Inaccessible list are those that could not be processed because of file system security, Worldox security, read-only attributes or other conditions that prevent the document from being accessed and worked on. In addition, if the profile group in which the documents reside contains an invalid base path (containing a space for example), or if the file has a space immediately prior to the document extension, they will be shown in the inaccessible list
Corrupted Documents - Documents in the corrupted list are those that Symphony OCR does not recognize as valid files. The most common reason is that the file is an invalid or corrupted PDF (try opening in Adobe to be sure). Another possibility is that there is some characteristic of the PDF that the Symphony OCR parsing algorithm isn't handling properly. Trumpet does periodically update the PDF parsing algorithms to address corner cases that have not been encountered before.
What to do?
Try opening the file in Acrobat, then hit Save (Acrobat will try to open and auto-repair corrupted files - when you save the document, it will save uncorrupted). After saving and closing the document, click the Re-Analyze button on the document record in Symphony OCR. This will only work if the file is only lightly corrupted, but is worth a shot.
If that doesn't help, next check to see if the file is already text searchable (i.e. can you search for text inside the PDF already?). If you can, then the document isn't a candidate for OCR anyway, and you can just move the document to the Ignore list.
If the document does need to be OCRed, and the Adobe repair doesn't help, then you may want to submit the document to us for analysis. Open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload the document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch.
If there are a large number of files that have the same corruption reason, and the files don't appear to actually be corrupted, please open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload a sample document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch. Alternatively, you can use a bulk Ignore operation to move the documents to Ignore.
Encrypted / Restricted: Documents in the Encrypted/Restricted list are those that are restricted from being processed because of some characteristic of the file itself (for example, an encrypted or partially restricted PDF file will not be processed).
Ignored: Documents in the Ignored list are documents that a Symphony OCR administrator has explicitly told Symphony OCR not to process. Any document on this list was explicitly placed there by human intervention.
Wrong Type: Documents in the Wrong Type lists are a tif documents and TIFF processing is not enabled.
Moved / Unavailable: Documents in the Moved / Unavailable list are no longer available in the Document Management System (DMS). This could mean that the DMS has gone "offline" or the DMS settings have been adjusted so that the documents would not have been found for processing (e.g., if a user selects a profile group to analyze and OCR, and then chooses to un-check that profile group or no longer process it). Document records in the Moved/Unavailable list will be deleted from the database after 15 days. Documents can also appear in the Moved / Unavailable list if they are no longer at that current location.
Digitally Signed: Documents that are digitally signed will not be processed by Symphony OCR because adding OCR information to these documents would invalidate the digital signature. If you wish to have these documents OCRed anyway (and are OK with invalidating the digital signature), please send an email to support@trumpetinc.com and request that functionality be added.
Too Big (to 8.0.0 and higher)
If a document falls into this list, it does NOT mean the document is contains too many pages. Symphony OCR processes files one page at a time. So if a document falls into this list, it means the document contains one or more pages with pixel dimensions larger than a specified value. In this version of Symphony OCR that value is 32512 x 32512 pixels.
This is a hard limit and cannot be overwritten.
Too Big (Prior to 8.0.0)
If a document falls into this list, it does NOT mean the document is too big. Symphony OCR processes files one page at a time. So if a document falls into this list, it means the document contains one or more pages with pixel dimensions larger than a specified value (ie. The page couldn't be loaded into memory). We usually see this in documents like blueprints of schematic drawings. But there are some things we can do to try to get these types of documents processed, if you find that it needs to be processed.
Clicking on the document in the 'Too Big' list will tell you the size of the offending page.
1) Click on the 'Too big' list.
2) Click on the individual document in question.
3) The offending size of the document is available in the document details.
If you find you have a series of the same type of documents, it's usually the case where the same size file is exceeding the limit. You can attempt to process these documents by modifying the value(s) declared in the setting.xml file. (Defaults differ depending on the version you're running.
See Manipulating Document Lists for more information on how to manage these lists
You are able to manipulate the document records for documents in the various document lists. You may wish to manipulate the records in bulk or on a per document basis.
Control
Refresh - refreshes the current list
Export as CSV - exports the current list to a .csv file
View Timeline - provides you with a timeline of documents processed. See Document Timelines for more information
Show Bulk Operations / Hide Bulk Operations - this is a 'toggle' button which will show or hide the bulk operations sections
Filter - filters the list based on criteria you enter. In the "Filter" field, include the criteria in which you would like to use to filter the lists, and then select "Filter" to filter the list (e.g. X:\Docvault\Client\).
Bulk Operations - Bulk operations will be applied to the filtered (or unfiltered list seen below)
Reanalyze All - will reanalyze the documents in the list
Delete All - will delete the Symphony OCR database record for the documents in the list
Ignore - will place all the documents items in the Ignore list
Adjust Priority to - will allow you to set the priority for all documents in the list. By default, Symphony OCR will analyze/process the most recent documents first, and then work backwards to process other documents. See Processing Priorities for more information.
Single Document Control - the following apply to the documents in the list and not bulk operations
Document Details - to see the details regarding the document (e.g. document history, visible words, hidden words, etc) select the document from the list
Reanalyze Document - will place the document in the Analyzing list for re-analyzation
Ignore - will place the document in the Ignore list
Open - opens the document
Some firms may wish to make a decision about which documents should be OCRed and which should not. Symphony OCR has always provided firms with the option to process only documents in particular Worldox profile groups, but in version 6.0 and higher, we have added more fine grain control.
As you may already know, your Symphony license allows you to OCR a certain number of pages each year. Therefore, the ability to control processing priority allows firms to ensure that their highest priority documents are processed first, saving lower priority documents for times when excess page count is available.
There are five levels of processing priority in Symphony OCR:
Very High - these documents are always processed first, and will be processed even if the system is throttling based on page count needs
High - these documents will be processed after those with the priority level of "Very High," but will still be processed even if the system is throttling based on page count needs
Normal - this is the default priority for documents
Low - these documents will not be processed if the system is throttling based on page count needs unless there is sufficient page count available
Very Low - this is the lowest priority, and these documents will not be processed if the system is throttling based on page count needs unless there is sufficient page count available and the "Low" priority documents have been processed
Note: Documents inside each of the priority levels will still get processed by age (most recent first).
For further information on throttling, see: How Backlog Throttling Works
There is a difference between the priority level of a given document and the default priority level that is assigned at the Worldox profile group or folder level. For example, you may set the priority level for a particular profile group to be "High". This will automatically set the priority level for any new documents saved to that location as "High", however, documents that have already been found prior to the assignment will be processed at the default level (Normal) unless you re-prioritize the documents already found. In addition, you may opt to assign a particular document (or set of documents) that are in the profile group to have a "Low" priority level even though the default for newly saved documents is "High".
One example of utilizing this particular tool in Symphony OCR is assigning the documents in a particular profile group with a Very Low priority. For example, a firm may have a legacy store of documents and there is a Legacy profile group pointing to this legacy store. While the firm would like to process the legacy store, it's much more important to the firm to process the documents in the current "live" Worldox document repository before processing these. Therefore, you can set the Legacy profile group to have a Very Low or Low priority level.
For further information on configuring profile groups to process by priority levels see: Configuring Worldox
One example of utilizing this particular tool in Symphony OCR is assigning the documents in a particular monitored folder with a Very High priority. For example, a firm may have a set of documents that you need to process with a higher priority than others. While the firm would like to process the other documents, it's much less important than documents for other clients and/or matters. Therefore, you can create a separate monitored folder for these documents and assign them to have a High Priority.
For further information on configuring profile groups to process by priority levels see: Configuring Monitored Folders
Another example may be that a particular firm has a need to process a certain set of documents very quickly. For example, perhaps the firm has an impending court case and needs these documents OCRd urgently. The default priority level for the location in which these documents reside may be set to "Normal", but the firm would like to process these particular files right away. In this instance, you can find this set of documents by filtering the PROCESSING list. Once the documents have been found in the Processing list, you can reassign their priority to "VERY HIGH," for example. This will ensure that they are processed before other documents. These files may be in the same profile group or they may be in different profile groups.
For further information on re-prioritizing documents in existing lists see: Manipulating Document Lists and How to Adjust Processing Priorities
Use the Document Info tool to enter the full path of the document or the document id and click "Query."
You can also determine whether or not a document has been processed by checking the Audit Trail on the document. Simply display the AppName column in Worldox:
For more information on how to display Audit trail information for a particular document, see Audit a Single Document's Events
There are two methods for looking up the details of a particular document:
Lookup By Path - Enter the full path of the document and click "Query" (See also: Checking the Status of a Document).
Look up a document - NetDocuments
Look up Document- Worldox
Look up a document - Windows Folder Tree
Once on the details page, the user can perform these functions:
Refresh - Provides the most current details of a document.There are also various bits of data or history showing what's been found on the file, and what's been done to it. For example, the "History" section shows all of the events logged for that file.
Note: If you delete the details of this document, it will delete this history and start from scratch. The "Page Analysis Details (before processing)" indicates how many words per page were found within the file BEFORE Symphony OCRed it. Visible words are computer-readable words (like digital headers or footers, or text generated by Word, etc). Hidden (aka invisible) words would be words applied by something like Symphony OCR. Note that these numbers are PRE-processing and they do not update after Symphony OCRs the file.
Depending on how Symphony OCR Notifications are configured, Status Notifications may be sent to you nightly, when there are errors, or when there are warnings. See Notifications for how to set this up / edit the notification frequency.
The email notification will look and feel very similar to the Summary Page but without the large graph.
You can utilize the buttons in the Notifications to manage Symphony OCR providing that you have network connectivity to the Symphony OCR servers.
If you're not on the same network as Symphony OCR then you can't use the buttons, but the data presented can still give you a quick glance at its progress.
System Statistics tells you how many files are in the Analyzer or OCRing backlog, and how long it estimates it will take to complete those backlogs.
Document Lists give you the itemized numbers of documents found that were Processed or Not Processed. Read the article titled "Symphony OCR Workflow, Tools & Document Lists" for more information on those lists.
Symphony OCR is a back-end process that requires very little ongoing maintenance. However, as an administrator, there are a few tasks that should be performed on a monthly basis to ensure that everything continues to run smoothly. This will take maybe 5 minutes per month, and will help to detect and correct any systemic issues before they become problems:
Backlog throttling is an advanced feature in Symphony OCR that ensures that your system will have processing capacity to use on new documents as they are added to your document repository. If that capacity does not get used in a given day, then the backlog will be processed using that unused capacity. This only applies for folks that have a limited page count (if you have a Symphony OCR Trial license this applies to you). If you have an unlimited page count with Symphony OCR, then this won't be necessary.
Symphony OCR considers any document with modified date older than 5 days to be part of the backlog. Any document with modified date less than 5 days will be processed regardless of any throttling.
If you wish your backlog to be processed more quickly than the throttling allows, you have two choices:
Of these, increasing the processing capacity is almost always the correct answer. If you enable overclocking, you could wind up with your entire backlog being processed, but not being able to process any new documents added to your document repository.
If you are certain that adjusting the backlog throttling algorithm is what you need, here's how:
Important: Enabling overclocking can result in Symphony OCR having insufficient capacity to process your newly scanned documents - this is almost always not what you want to do if you would like to process documents for longer term (like the full 30 day duration of your trial license). Contact your Trumpet sales rep when you're ready to buy the full version with unlimited page processing!
By default, Symphony OCR will retain originals of documents it processes for between 7 and 14 days. The retention period can be configured in the Processor Settings screen.
Symphony OCR stores these originals in the Work\Backups folder beneath the Symphony OCR installation directory (normally on the C drive of the Symphony OCR workstation). If you wish to move the backup directory to a different volume, here is how:
Important: While you can technically change this storage location to a network drive, we strongly recommend against it.
From time to time you may need to roll back to the original of a particular document (the document that has not been OCR'd). This functionality has been available for some time but has been exposed in the user interface in builds of Symphony OCR that are version 6.4.44 and higher.
In order to "roll back" a file that has been OCR'd to it's original version:
Note: If you do not want the document to be reprocessed, you can move it to the "Ignore" list. This will ensure that Symphony OCR does not re-OCR the document.
Users may need a document found and processed prior to Symphony OCR finding the document in NetDocuments due to a lengthy backlog. To do so in NetDocuments, you can perform the following steps:
This will search Netdocuments for the document and add a document record for Symphony OCR to begin it's process of OCR'ing the file.
Recently, we have seen instances where certain anti-virus software has been interfering with SymphonyOCR processing.
One specific example that has been clearly defined is WebRoot blocking access to one of the working folders ("C:\Program Files (x86)\Trumpet\SymphonyOCR\work\processor1"), or Unable to create working folder C:\Program Files (x86)\Trumpet\SymphonyOCR\work\msg1. This is a new development, and one we find very curious because this portion of SymphonyOCR has not been changed in years. So we're not sure why it is suddenly a problem for WebRoot. Despite this, we are trying to work with them to help alleviate this issue.
Until we are able to work out a resolution with WebRoot, we recommend starting with adding exclusions for all executables in the SymphonyOCR folder and all subfolders ("C:\Program Files (x86)\Trumpet\SymphonyOCR"). There are no risks for adding exclusions for our executables.
NOTE: The names of the executables can change slightly between versions. So this will need to be checked and possibly updated with any update to SymphonyOCR.
If the anti-virus software is not monitoring executables, but rather doing some type of file level access monitoring, it may be necessary to add a full exclusion to the "C:\Program Files (x86)\Trumpet" folder and all sub-folders. We will update this when and if we are able to get further information, or if we find any other anti-virus software causing similar issues.
Symphony OCR communicates with outside servers for the following purposes:
Status: OK
Service: false
User: admin.nd
AvailableCPUs: 6
ParallelDocuments: 4
ParallelPages: 1
Host: XXXXXXX.LOCAL
Pages left: 12344
OCR backlog: 3158221 pgs
Usage: {}
AnalyzerUsage: {6632=[6632,989948,9715768]}
ND preserve update info: true
ND versioning enabled: Do not create versions
If you require the list of servers to configure your firewall and ensure that Symphony OCR will work nicely on your network, please email us at support@trumpetinc.com requesting the confidential document entitled:
External Servers Used by Symphony OCR (134350)
In the event you should need to remove Symphony OCR, the steps to do so are below. Note- if you're looking to migrate Symphony OCR to another workstation, the instructions to do so can be found here: Migrating Symphony OCR to a new workstation
Open the Control Panel and navigate to Programs and Features.
From within the Programs and Features list either find or search for 'Symphony OCR (remove only)' and double click to launch the uninstaller.
When the uninstaller launches, simply click 'Uninstall', wait for it to complete and then hit 'Finish'.
Symphony OCR will then be uninstalled.
MICR processing can be enabled by editing the ocrHandlerProvider configuration in settings.xml by adding enableMicrProcessing="true"
Once MICR processing is enabled, that will be indicated in the Advanced Settings section of the Processor Config screen.
To enable processing of Email (.msg) message attachments:
Notes:
- 32bit Outlook will need to be installed and launched once on the workstation that SOCR is installed to, however you don't have to configure an actual email address. If this has not been done then SOCR will provide an alert in your Summary page.
- Worldox Document Management System only: The above enables the OCR of the email attachments, but in order to enable full indexing of email attachments in Worldox, you will need to enable the feature. Here are the instructions: How to Enable Text Indexing of Email Attachments
To enable processing of TIFF (.tif or .tiff) documents:
Note: Enabling this feature will convert files with the extension .tif to .pdf so if you have any shortcuts that rely on the full file path of the document (including the file extension), you will need to update those.
To OCR files that are marked as Read-Only:
By default Symphony OCR will rotate processed pages so that the page orientation is correct when opening the files in the document repository. If you prefer for Symphony OCR not to rotate pages, you can disable the functionality.
By default, Symphony OCR will retain originals of documents it processes for between 7 and 14 days. Here are the steps for disabling the back up:
By default, Symphony OCR will retain originals of documents it processes for between 7 and 14 days. You can adjust the retention settings to maintain the documents for a longer period of time. To do so:
By default, Symphony OCR will retain originals of documents it processes for between 7 and 14 days. The retention period can be configured in the Processor Settings screen.
Symphony OCR stores these originals in the Work\Backups folder beneath the Symphony OCR installation directory (normally on the C drive of the Symphony OCR workstation). If you wish to move the backup directory to a different volume, here is how:
Important: While you can technically change this storage location to a network drive, we strongly recommend against it.
For sites that need to split processing between multiple installs of SOCR, the preferred method is to split by cabinet or folder (i.e. have one set of folders that one install of SOCR is responsible for, and another set of folders that another install is responsible for). This isn't always possible, and another strategy is to use date filters to segment the processing. The idea here is to allow configuration of a filter that completely blocks the instance of SOCR from finding the documents that it shouldn't process (they won't appear in the second Symphony OCR database at all).
Note DMSes that force the modified date to change (i.e. NetDocuments) will ultimately end up with the document being discovered by the other SOCR install. So the document will get analyzed a second time and moved immediately to the Processing list. Not the end of the world, but it'll involve unnecessary downloading of the file.
Currently, this functionality cannot be configured through the UI. In order to configure this option
This will ensure that one instance of the software is only processing documents with a later modified date than the one specified, and the other instance is only processing documents with an earlier date than the one specified.
The search summary provides a list of cabinets and the number of documents and pages within that cabinet (profile group).
To access the Progress Details page, select "View Detailed Progress" in the appropriate DMS's configuration page.
From the Progress Details, you can evaluate the number of documents / pages per cabinet (profile group) and determine the percentage of completion within the cabinet (profile group). This can assist you in understanding how many documents are eligible for OCR in a particular cabinet (profile group) and can lead to decision making regarding the priority in which you may want to process those documents.
Once you have accessed this page, you can select any of the cabinets (profile groups) to open the Processing Document List. This will automatically filter the Processing Document List to contain only the documents in this particular cabinet (profile group).
See Manipulating Document Lists for further information on how to manipulate the document records for these documents.
Document Timelines give a week-by-week summary of the number of documents and pages in a given document list. The timelines are organized around the document's modified date, so they represent approximately when the document was added to the system. To view the timeline for a given document list, click into the list then click the "View Timeline" button at the top of the list.
Timelines can be useful for determining how quickly new documents are added to your document management system. For example, the timeline of the Processing and Processed document lists can provide how many documents and pages that are eligible for OCR have been added to the system in the past 52 weeks. This will give an approximate rate of new documents per year.
To view the timeline for processed documents:
2. On the "Documents of Type Processed Screen" locate and select "View Timeline" link.
3. This will take you to the page for "The Timeline of Processed" documents.
This screen will show how many documents and pages were processed from week to week (cumulatively as well). The timeline can also be exported as a CSV or Image file by selecting the appropriate button (this will give you the full history as opposed to going back only 100 weeks).
Symphony OCR can integrate with the following systems: The links will take you to the configuration instructions:
Configuration Guide - NetDocuments
Configuration Guide - ShareFile
Configuration Guide - Open Text
Configuration Guide - Practice Master
Configuration Guide - Microsoft One Drive
If the file is "web optimized" before Symphony processes it, then it will continue to be "web optimized" after
- so this is more of a question of how your scanning
is configured. Trumpet's mandate is to completely preserve the
original. It is much more important to preserve the original than to
do any sort of optimization on an existing
file - just not worth the risk
- especially when you consider that these processes
are being done at high volume, unattended
Symphony OCR is typically installed to a single machine.
For Wordox DMS integration, Symphony OCR is typically installed to the PC that also runs the Worldox indexer, but this is not required.
Note: The OCR process will automatically throttle itself if another process on the same computer needs to run.
Note: If you wish to OCR email attachments, then the 32bit version of Outlook (2010 or newer) will need to be installed as well. Does not work with 64 bit Outlook.
4 core CPU, highest clock speed available (up to 16 cores if you purchase a license that supports additional cores)
At least 4 GB RAM
Fast network connection between the PC and server (1 gbps is recommended)
100 GB+ Disk space
While definitely not required, some clients like to back up the Symphony OCR database, which resides in the 'C:\Program Files (x86)\Trumpet\SymphonyOCR\data' folder, on a nightly basis. But it's not required — The Symphony OCR machine can be recreated using the documents in your document management system.
OCR adds between 1 and 5% to the total size of the source file if the source file is scanned in black and white. For grayscale or color images, the increase in size is less than 1%.
If that's not making sense to you then here's an explanation and a metaphor:
Grayscale and color images are larger (in bits) than black and white images. This means that a 5-page document scanned to PDF in color/grayscale has more bits than the same 5-page document scanned to PDF in black and white. Symphony OCR, however, applies the same layer of text to both documents and that text would increase the same # of bits on each of the two scans. So, the percentage of size Symphony OCR adds actually goes DOWN the higher quality the scan gets.
Here's the metaphor: picture those scans as a sink of water (black/white scan) versus a tub of water (Color/GS). Now add a rubber duck (SOCR text) to each. The space the duck takes up in each body of water has a different percentage in relation to that body of water. The duck's percentage of space added in the tub is LESS than it is in the sink.
OCR accuracy is highly dependent on the quality of the scan. But given the same scan, the engine that Symphony OCR uses (ABBYY) is widely recognized as the most accurate in the industry.
Here is an analysis of OCR accuracy performed by outside sources
For best accuracy, we recommend scanning at:
Note: 200 dpi will still work for black and white, but accuracy will drop slightly. Below 200 dpi, accuracy drops off sharply for smaller fonts, and is not recommended. Above 300 dpi, accuracy does not measurably improve (and results in much larger files that are slower to process).
Symphony OCR's content preservation system is PDF/A aware. If the source document is PDF/A compliant, then the results after OCR will also be PDF/A compliant.
No, Symphony Suite is only compatible with the Windows operating system.
Symphony OCR is designed to automatically throttle down it's CPU usage if another application on the computer needs the CPU, so the performance impact of Symphony OCR on the indexer PC is negligible.
Symphony OCR is designed to use as much of the computer's CPU capacity that is available (it will use up to 5 cores in full processing mode, 4 of which are used for the actual OCR operation), so you will see the CPU pegged at 100% utilization while there are documents to process, but this processing will not impact other processes on the computer from running normally.
If you wish to Limit the number of Cores that Symphony OCR utilizes, you can do so in the Advanced Settings of the Processor see: Processor for more information..
If your firm uses a solution like Servers Alive to track the health of software and servers, you may want to have it monitor the Symphony OCR system status.
As of version 5.2.45, Symphony OCR adds a special status page that is easy for monitoring applications to parse:
http://your.symphony.workstation.name:14722/maestro/do/status
This will return a plain text response that looks like this:
Depending on the state of Symphony OCR, the status values could be one of:
Tip: To test a change in status, go to the Processor configuration screen and stop the processor - this will switch the system status from OK to WARN. Be sure to re-start the Processor after you are done testing.
Finally, here is the configuration screen for Servers Alive to check status of Symphony - if you use a different monitoring application, adjust as you see fit:
Optical Character Recognition (OCR) converts images into characters for text searching. Because OCR is time and resource intensive, performing OCR during scanning significantly reduces your efficiency. Symphony OCR performs the OCR task in a background process, allowing you to turn OCR off during scanning. This procedure covers how to disable OCR when scanning using Adobe Acrobat.
Note that this setting is made on a per-user basis, so you'll have to do it for each user on a given workstation (you can thank Adobe for that - if anyone does figure out a way to get this turned off using a registry setting or anything like that, please let us know).
For the most part documents that do not contain an image or text don't need to be processed. One example of this might be your company's PDF letterhead template. By default these PDF documents will not be processed and will be placed in the "No Image or Text" Document List. How can a PDF have no image and no text?
If you would like Symphony OCR to process a specific document (or set of documents), you may force processing. Here's how:
You can confirm this setting has been applied by viewing the Analyzer and Processor pages:
Yes! If you have a lot of checks that you scan and save into your repository then you may want to ensure that checking account numbers are getting OCR'd. Symphony OCR has the ability to enable MIC Recognition but it will require a Trumpet technician to enable for you. It's a quick process, just contact support@trumpetinc.com to schedule a call.
Trumpet Support: see Page 1979
To enable processing of Email (.msg) message attachments:
Notes:
- 32bit Outlook will need to be installed and launched once on the workstation that SOCR is installed to, however you don't have to configure an actual email address. If this has not been done then SOCR will provide an alert in your Summary page.
- Worldox Document Management System only: The above enables the OCR of the email attachments, but in order to enable full indexing of email attachments in Worldox, you will need to enable the feature. Here are the instructions: How to Enable Text Indexing of Email Attachments
**This article is for Worldox integration only. If you have NetDocuments or another document management solution, refer to the provider for more information to ensure Email Text Indexing is available and enabled.
**
If your firm wishes to text index email attachments, you must enable the setting.
Here's how:
Note: In order for the Indexer to index email attachments (and .msg files) the Indexer must have Outlook installed.
From the Windows Start menu, type the following (where "X" is the network location of Worldox):
X:\Worldox\wdadmin.exe /ini
Hit Enter. This will launch the administrative properties dialog
Select the WDIndex tab
Select the category "Common Options" and set the "Index Email Attachments" setting to "Yes"
.After enabling the feature, perform an INIT on the text indexes. The Worldox Indexer performs INITs once a week by default and that time can be identified in WDInexer and then checking the schedule
If you wish to Index your email attachments, please notify Trumpet by sending an email to support@trumpetinc.com.
Symphony OCR assigns a unique identifier to each email attachment that it processes. The identifier will start with the path of the actual document and then include a unique identifier for that particular attachment:
In order to determine the "name" of the email attachment that is assigned the unique identifier, select the document in the "Document Path" of any of the Worldox lists
This will provide you with the "Details" of that particular document record, and the attachment name will be listed in parenthesis:
There may be instances where you need to OCR a set of documents more quickly than others. For example, you may have a particular matter going to trial next week and need to OCR the discovery for that matter or perhaps you’d like to set a priority of processing all discovery documents in your document repository first.
Note: This procedure is a one-time adjustment of the priorities of filtered files. You can also adjust the default priority of the documents by setting up separate monitored folders for each and assigning different priorities as applicable by following the instructions found here: Processing Priorities
To adjust the processing priorities for Symphony OCR for a one-time instance, you can use the following procedure:
Because Worldox now runs as a service, Trumpet has received many requests to have Symphony OCR also run as a service. This is possible using version 6.6.13 and higher of Symphony OCR.
If you are performing this update in concert with updating Worldox to run as a service, after the Worldox update you'll want to ensure that you've launched the Worldox client on the machine running the Symphony OCR service (typically the Indexer) as the Symphony OCR user. You can determine the user by selecting the "Worldox" link in the navigation panel. It's typically 000000, but yours may be different.
To run Symphony OCR as a service:
Password Requirements: Must use the users Windows Password. PINS or other security Keys wont work
By default, Symphony OCR does not process .tif files. The reason this is not enabled by default is that because it is not possible to add an invisible layer of text to a .tif file, Symphony OCR actually converts the tif files to pdf. This provides firms with the option to enable if they choose to do so.
Here's the life cycle of a .tif file in Symphony OCR:
The "Finder" tool in Symphony OCR is responsible for finding .tif, files in the document repository (amongst other file types). It will search for .tif files regardless of whether or not the firm has chosen to process .tif files. Once the Finder has found the documents, it passes them to the Analyzer Tool.
The "Analyzer" tool in Symphony OCR is responsible for analyzing documents to determine if they're eligible candidates for OCR. The Analyzer will determine if the .tif file can be OCR'ed regardless of whether or not the firm has chosen to process .tif files. If the .tif file is eligible for processing, it will place the .tif file in the "Processing" queue.
When the Processor determines that the file is a .tif file, it will immediately place the .tif file in the Not Processed \ Wrong Type list.
The Processor will process the .tif file by converting the .tif file to a .pdf file. Why? The .tif format does not allow the invisible layer of text to be added. Therefore, it must be converted to a .pdf file when processed.
If you wish to simply determine how many .tif documents are eligible for OCR in a particular document repository you can simply *not* enable .tif processing. Then check the "Wrong Type" list's Timeline to determine the number of documents / pages that could be processed.
There are four different levels of versioning available for NetDocuments Symphony OCR users, and the type of versioning you choose will determine the behavior that Symphony OCR uses.
The levels and how they work with .msg files are listed below:
Let's discuss the behavior with versioning for .tif files and the four applicable settings:
If Symphony OCR has indicated in your Document Details that it has processed (OCR'd) a file but you'd like to see if with your own eyes, here are a couple tips/tricks we use to do that:
Remember: Symphony OCR applies an invisible layer of text to your files - it does NOT control the search mechanisms withing your document repository. If your trying to do text-in-file searches inside your document repository and you're getting no hits then check to make sure file text/content is being indexed by your repository tools. Again, Symphony doesn't control text searches, it only puts text in your files.
Find:
You can open your PDF and use Ctrl+F (find) and then type out a word you see on the page. Presuming the file has text now, your PDF text finder should highlight the word your searching for. Be sure you're searching for a word that you already know exists within your document.
Copy Paste:
Alternatively, you can try copying and pasting text from your file as well. This will show you that the text is there and is accurate.
Once a document has been OCR'd you can copy the OCR'd results of a document to Microsoft Word or other word processing editor.
To do so:
Note: This will not preserve the formatting of the document (Headers, fonts, etc) which can be adjusted in Microsoft Word.
After installing Symphony OCR, you will almost certainly notice that the CPU on the machine running Symphony OCR is consuming 100% of the CPU.
Symphony OCR is designed to consume all CPU resources available - Up to 16 cores (depending on the number of cores and the license you purchased) during OCR. OCRing documents is an *extremely* CPU intensive operation, which means that it will use far more CPU than almost any other application you may be familiar with. With many applications, seeing the CPU spike for a long period of time is cause for concern - but with Symphony OCR it is absolutely expected and desirable behavior.
That said, it is important that Symphony OCR be a good digital citizen and allow other applications to use those CPUs when they need to. Symphony OCR is designed to allow exactly that to happen. Symphony OCR runs at a lower priority than all other tasks, so it will always yield when another task needs the CPU. You may notice a little delay when other applications need the CPU, but we've had no reports of Symphony preventing anything else from running as needed. If you are seeing other apps hung up, we'd definitely like to know about it.
To limit the cores that Symphony OCR utilizes, use the "Advanced settings" section of the 'Processor' tab (very bottom of the page). Just enter the maximum number of CPUs for Symphony OCR to use in that field. See Processor for more information.
In some extremely rare situations (we've seen this twice now), if thermal management of the CPU is not designed properly (e.g. incorrectly applied thermal paste between CPU and heatsink), it is possible for a machine running at 100% of CPU to overheat and shut down. The two times we saw this, the machine powered itself down without any warning or user interaction. After fixing the thermal paste, the problem never recurred.
Note: if you have other CPU-intensive or time-sensitive apps that need to run and you feel that Symphony OCR is interfering, you can add events in the Symphony OCR Scheduler to stop processing documents during time periods. In practice, we've seen very few sites that require this type of schedule management.
Symphony OCR is intelligently designed to be a "good digital citizen." Bottom line, other applications get first priority after which Symphony OCR will use all available resources to process, so it throttles based on demand from other apps.
These are some of the frequently asked questions with regards to how Symphony OCR works with SharePoint:
Symphony OCR supports OneDrive for Business (SharePoint Online) not personal OneDrive accounts.
Symphony OCR integrates with your Office365 SharePoint tenant. It does not actually run on the Office365 cloud, but runs on a workstation and integrates with SharePoint directly.
Symphony OCR has a scheduler component which is incorporated in the software. You can determine which days of the week and what times of the days that it will run. The OCR process is CPU intensive. Data transfer accounts for approximately 3% of the processing time for a given document. A fast server typically sees around 2-3 seconds per page throughput, workstation class operating systems are approximately twice that (4-6 seconds per page throughput). This is very dependent on hardware and network speeds.
The Symphony OCR license must be greater than or equal to the SharePoint user count. To determine your user count:
<d:element m:type="SP.KeyValue">
<d:Key>AccountName</d:Key>
<d:Value>i:0#.f|membership|kevin@trumpetinc.onmicrosoft.com</d:Value>
<d:ValueType>Edm.String</d:ValueType>
</d:element>
In the above example, the username is kevin@trumpetinc.onmicrosoft.com
If you add an additional user, Symphony OCR will send the person you indicate in the Notification center an email notification indicating you have exceeded your license count. Symphony OCR will continue to process during a 10-day grace period to allow the software to continue running and ensure you can order additional users for Symphony OCR.
If you need to adjust your SharePoint Tenant, you'll need to do the following:
The Sharepoint API calls do not allow Symphony OCR to preserve the modified date of files. Therefore, when Symphony OCR processes a document, the modified date will be adjusted to the date the document was processed by Symphony OCR.
You integrate your Symphony OCR software with your NetDocuments repository and you're getting emails saying something to the extent of "NetDocuments Download Threshold Exceeded".
This email comes from NetDocuments based on, as far as we know, repository settings configured to alert an admin of when a user's download activity exceeds a certain, defined, number. We do not control these settings or email notifications, but we've located an article published by NetDocuments which talks more about this which may be of use.
Symphony OCR must download your documents in order to analyze and OCR them. This process does count towards the NetDocuments warning threshold.
The setting to adjust the threshold is located in the NetDocuments Admin Settings under "Edit name, logo and billing information".
Please note that Trumpet, Inc. does not in any way support or represent NetDocuments, and if you would like to learn more about the email you received, or to adjust the settings that control it, please reach out to NetDocuments directly.
If your license allows for our maximum core count of 16. This is 16 cores used for OCR itself, and we'll use another 8 cores for pre-analysis. Pre-analysis takes 1/50th the time as OCR, so it's only important to have 16 dedicated physical cores if you are trying to maximize Symphony performance.
Application bit-ness has no influence on our performance for multi-core operation, so no performance degredation from that as you add cores.
What's more likely is that Symphony will eventually saturate the
bandwidth available for transferring files (either b/c of limitations
at via your internet connection, or the server
bandwidth). Other potential limiting factors would be free RAM and
disk performance (but on any modern server, these are negligible
compared to internet bandwidth).
The document retention setting controls a short term backup that we can keep on the local disks on the Symphony machine, just in case something goes horribly wrong. This functionality was added to the original Symphony application 15 years ago as a belt-and-suspenders just-in-case feature. We've actually never had to use this backup, and there are better ways to restore original documents, so very few of our sites bother to use it anymore.
Firms that want to be able to recover the original document (prior to OCR) generally use two strategies:
Symphony OCR may not yet have processed some documents, and / or may not be able to process some documents. You can set up Notifications to track which documents are in the various Not Processed lists see: Configuring Notifications for instructions.
If you do not currently receive notifications but want to troubleshoot why a particular document is not text searchable, here is a quick procedure:
The Symphony OCR installer downloads and runs the Symphony OCR Engine installer. If it is unable to download the Symphony OCR Engine you will receive an error message like this:
This type of error almost always means that you have a hardware firewall or border router that is interfering with the download of a required component.
1) Configure your firewall to allow traffic coming from the www.trumpetinc.com domain.
2) If you are unable configure the firewall, you can download the required compontent from the URL provided by the installer error message
after you download, install to the folder that Symphony OCR is located in (this is typically 'C:\Program Files\Trumpet\SymphonyOCR', but may be called 'Maestro' in older installations). One you complete this install, run the regular installer. The regular installer will detect the engine and will not attempt to download it again.
Symphony OCR is in either a warning or error state with the error message indicating that there is Insufficient Disk Space.
As a "belts and suspenders" operation, Symphony OCR creates a back up copy of files prior to processing them and saves that copy to the local workstation that runs Symphony OCR. For more information regarding setting up the retention see: Configuration Guide: Processor
If the workstation's amount of available disk space falls below 1.5 GB, Symphony Profiler will enter a "Warning" state.
If the workstation's amount of available disk space falls below 1.0 GB, Symphony Profiler will enter an "Error" state.
There are a couple of options for you:
If the firm has opted to keep files for longer than the standard 7 days, you may wish to adjust the firm's retention rates to something smaller to purge the documents more frequently. See the section Configuration Guide: Processor to find out how to adjust
You can also change the location where the originals are stored by adjusting the retention location. Here are instructions for doing so: Changing the Retain Originals (Backup) Location
You can also edit the settings.xml of the firm to warn and error at different rates. Levels can be adjusted by manually editing settings.xml and adding errorUsableSpace and warnUsableSpace parameters to the <backupManager .... /> element
Symphony OCR utilizes the Worldox API to find the applicable profile groups to search for documents and also uses the Worldox API to determine if the Worldox license count matches the Symphony OCR license count as the two must be the same for Symphony to Process documents.
If you see error messages like the above, it's typically caused by Worldox not having been launched (mirrored) as the user that Symphony OCR is configured to use.
Launch Worldox (in Mirrored Mode) as the user that Symphony OCR is configured to use, then close Worldox. (You do not need to leave Worldox running - launching one time is sufficient to get the Worldox API to register properly)
We've seen at least one instance where a site had received the following error message and all processing had stopped until it was corrected:
"Unable to determine the number of NetDocuments users - java.lang.NullPOinterException: path is 'null'.
The problem appeared to be caused by strict Internet Security which was preventing the machine to access the necessary NetDocuments integration points. The solution was to reset the Advanced Settings in Internet Options.
To do this follow these steps:
Symphony OCR utilizes the Worldox API to find the applicable profile groups to search for documents and also uses the Worldox API to determine if the Worldox license count matches the Symphony OCR license count as the two must be the same for Symphony to Process documents.
If you see error messages like the above, it's typically because you have more Worldox licenses than Symphony licenses. Symphony OCR provides you with a 15 day grace period to get your Symphony OCR license up to date.
Contact your Channel Partner to request additional Symphony licenses.
Communication errors between your machine and our servers, such as the error above, can be caused by a discrepancy between your machine’s clock and our server’s clock. If their times are not matching by within a few seconds, then our server will not allow access for security reasons.
Change the time on the machine to the correct time. If your computer is part of a network that is all on the same time then make sure the changes are made globally.
If you need to move Symphony OCR to a new workstation, you have two choices:
This procedure covers the first approach.
For the technical details, here's a link that tells a bit more about this error message: http://www.duckware.com/tech/java6msvcr71.html
An update to the latest version of Symphony Profiler or Symphony OCR should resolve the issue. Please use your Symphony OCR / Symphony Profiler Installation Guide, or contact Trumpet support (http:\\support.trumpetinc.com) for update instructions.
If the internal Symphony OCR database becomes corrupted, the system will throw all sorts of interesting errors. This sort of problem should not be happening very much, so if you see it multiple times, be sure to request support. If you are directed by Trumpet support to either repair, reset or compact the database, here's how to do it:
If the Symphony OCR database becomes corrupted, a 'Compact' operation will often repair the damage - here's how:
In some cases, a Symphony OCR will be so damaged that a full reset will be necessary - here's how:
Your firm has more pages to process than you have processing capacity. Symphony OCR's backlog throttling system is kicking in to ensure that you have sufficient processing capacity for the new documents that you are likely to add between now and the end of the year.
Please refer to the following article for a discussion on backlog throttling: How Backlog Throttling Works Please note that backlog throttling is a very advanced feature, and it is very unlikely that changing the parameters is the correct thing to do for your site.
You have two choices:
In some situations, you may wish to completely remove records from the Symphony OCR database. One example of this is when the user changes the selected PGs in the Worldox configuration screen. Changes to the Worldox configuration screen impact the *finder* component of Symphony OCR only. Any documents that are already in the database will continue to be processed, even if they reside in profile groups that have been de-selected.
Important: this procedure uses the Advanced/Debug interface for Symphony OCR. Be careful!
This option works well if you wish to delete records for a handful of documents (up to about 150)
If you wish to remove all document records from Symphony OCR, you can reset the Symphony OCR internal database. When Symphony OCR is restarted, it will rebuild the database based on the documents that it finds. This will result in a loss of all processing history, but any documents that have already been OCRed will not be affected, and records for those documents will be recreated and placed in the Processed list without consuming additional processing capacity.
In some situations, you may wish to explicitly tell Symphony OCR to process a set of documents with higher priority than others. This can also be used to process documents that wouldn't normally be discoverable by Symphony OCR (e.g. because of profile group selection).
This functionality was added in version 5.2.46
Important: this procedure uses the Advanced/Debug interface for Symphony OCR. Be careful!
After updating Symphony OCR to version 8.0.7 or higher, some files appear in the 'Needs Attention' list with the reason being "Adding text to image failed — null". When the files are reanalyzed, they are not processed and they return to the 'Needs Attention' list.
We have an update available that resolved the error. Update SymphonyOCR to version 8.1.3 or higher.
Symphony will automatically reanalyze any files in the Needs Attention list after the update. If you have Ignored files related to this error you can manually tell Symphony to reanalyze them.
Important: There was a bug in an earlier version of Symphony OCR (prior to 5.2.77) that could cause a large number of files to accumulate in the SymphonyOCR\Work\processor1\ocr folder. If this folder contains lots of sub-folders or files, please use the Check for Updates link to install the latest version of Symphony OCR. It is safe to manually purge the SymphonyOCR\Work\processor1\ocr folder (the DOS command rmdir /s "C:\Program files\Trumpet\SymphonyOCR\Work\processor1\ocr" will do this if you are comfortable using the command prompt). The rest of this article applies if you are running versions higher than 5.2.77.
By default, Symphony OCR will retain versions of the documents it OCRs for a period of 7 days. If your firm is processing a large backlog, this can represent a significant number of files.
The retained versioning system is merely a belts-and-suspenders feature - just in case something goes wrong.
These past versions are stored in the Symphony OCR\Work\backupfiles sub-folder.
You can adjust the retention period for the Symphony OCR backups in the Processor Configuration Screen or you can disable the retention entirely.
If you wish to purge old files immediately, make the adjustments to the retention policy, then use Debug->Purge old backups (Advanced->Purge old backups, on older versions.)
When Symphony OCR launches, it displays the Performing Maintenance. This screen never refreshes, and the "progress" graphic does not spin (i.e. is not animated).
Clicking on the Symphony OCR icon in the upper left corner of each page should refresh that page - when this issue is happening, the refresh does not occur.
This behavior can be caused by restrictive policies in Internet Explorer
Add the Symphony PC to the list of Trusted Sites in Internet Explorer. Here's how:
Symphony OCR converts image only PDF files into text searchable PDF files. Once the PDF file contains text, users can immediately search *within* the PDF. However, the Worldox text indexer must update before the text contents of the PDF become available for text-in-file searching (i.e. searching *for* the document).
We recently found an interaction problem between Symphony OCR and the Worldox indexer (WDINDEX) that prevents some PDF files from being added to the full text index.
WDINDEX maintains a local cache of the text it extracts from documents. This allows Worldox to use the cached text extraction during text database inits - instead of re-parsing text from every file on the network. This local text cache dramatically improves text database rebuild times. WDINDEX determines whether it should use the text cache for a given file by checking the network file's modified date. Symphony OCR preserves the file's modified date when it performs OCR. This can result in WDINDEX using cached text extraction (consisting of no text) instead of the updated version of the file on the network.
In version 5.2.67, Symphony OCR now modifies the date of files it OCRs by a single minute. This will ensure that WDINDEX will not use cached text extraction results for files OCRed after S-OCR was updated to 5.2.67.
For files that were OCRed before the update to 5.2.67, the user will need to purge the WDINDEX text cache - this is easy to do, but can take a bit of time, depending on the size of the document repository. Here are instructions:
The next time WDINDEX rebuilds the text databases, it will rebuild the local text cache (this could cause the first rebuild to take longer than normal). After that, everything should just like before - except that all of your OCRed files will be text searchable.
User notices that Symphony OCR has a system condition of Red and the finder screen displays an error of "java.io.IOException: StringLong keys can't be longer than 300K - database must be corrupted."
There is a bug in Symphony OCR prior to version 5.2.63 that can result in corruption of the Symphony OCR database indexes.
Documents appear in the Inaccessible list, and have history with the following note in it:
01-30-2013 03:17:45 PM : Security prevents manipulation of document
In Worldox: Searching for the document in Worldox shows an icon next to the document with a yellow or red lock icon.
Symphony honors your DMS's security model. If a document is hidden, or secured in a way that the Symphony User cannot modify it then it will not be processed. In Worldox this means: if a document appears with a red or yellow lock icon, it cannot be processed.
If the issue is related to Worldox security classifications:
If the issue is caused by the file being marked read-only:
Documents appear in the Inaccessible list, and have history with the following note in it:
File system or DMS security settings block processing
Searching for the document in Worldox shows an icon in the VER# column that looks like this: .
This icon indicates that the documents have been F-Locked, which is a special read-only setting in Worldox. Once a file has been locked using this method you cannot unlock it and it becomes read-only. The document will be identified as a locked file and a hash value ensures that file contents will not change. If you want to edit the file after locking it, you must check it out, edit it, and check it back in. The only option should be to create a new version, therefore, Symphony OCR cannot process a document that has been locked using this method.
Documents appear in the Inaccessible list, and have history with the following note in it:
File system or DMS security settings block processing
Searching for the document in Worldox shows no security which should prevent processing.
This will be the result if the server doesn't have 8.3 filenames populated for the file. 8.3 filenames are required by Worldox, but we see this sometimes when older legacy cabinets are processed. To determine if 8.3 filenames are populated, open a 'cmd' window and do a 'dir /x' on one of those file locations (see below for more instructions) - you should see the long filename *and* the 8.3 filename:
If the 8.3 filenames have not been populated, Trumpet has a tool to take care of that - 8.3 Filename Tool (Purchase required. Contact operations@trumpetinc.com for more information).
Note: For those that may not be super familiar with the command window, here's an example of what to type in the cmd line:
If the file in question lives in a path like this W:\DocVault\CLIENT\CLARCA1\BILLING\CHECK, then you'll type this in the command window:
dir /x W:\DocVault\CLIENT\CLARCA1\BILLING\CHECK
Hit enter and the window should generate a list of all of the files within that location. The farthest right column will be the document's name (doc ID) and the column to the left of that will show the 8.3 name, if available.
You see documents in the Inaccessible list, with the "Inaccessible Reason" showing: File system or DMS security settings block processing".
Symphony honors the Worldox security model. If a document appears with a red or yellow lock icon, it cannot be processed. If these files are secured see: Documents in Inaccessible list are not OCRed
If these files are not secured, (do not have the red or yellow lock icon next to them) this may be caused by having an invalid base path defined for the profile group. Worldox requires that each part of the Profile Group base path be less than 8 characters and contain no spaces.
Change the Worldox Profile Group base paths (all directories and filename) to contain no spaces and be less than 8 characters, rename the folders on disk, and adjust the indexing rules accordingly.
Another strategy is to configure a new UNC share pointing at the base of the profile, and ensure that the UNC share name is 8 characters, no spaces. Then use ?:\ notation to configure the base path (your Worldox reseller will probably need to help you do this).
Symphony OCR uses the Worldox API to interact with your Worldox document repository. This API consists of a DLL called WDAPI32.DLL and an executable called WBAPI.EXE. If these libraries are loaded from the server (instead of local to the Symphony workstation), any network interruption can cause the Worldox API to crash. This will not completely bring Symphony OCR down, but it does cause problems, and it is generally best to load those libraries from the local C drive of the Symphony machine.
Symphony OCR uses the workstation's system path to determine the location it should load libraries from. Worldox sets the system path appropriately whenever Worldox, WDMirror or WDIndex are launched. If you launch Worldox or WDIndex directly from your file server, then the system path will be configured pointing at the file server. If you use WDMirror, then the system path will be configured to point at the local C:\Worldox folder of the workstation. This later is what you want.
On the Symphony workstation, check all of the shortcuts you use to launch Worldox or WDIndex ande confirm that they are using WDMIRROR.EXE to launch. If you find any shortcuts that are launching worldox.exe, wdindex.exe, wbindex.exe (or any other executable besides wdmirror.exe), replace them with equivalent wdmirror.exe commands.
For example, here is the correct way to launch Worldox:
<network path to Worldox>\wdmirror.exe
or
C:\Worldox\wdmirror.exe
And here is the correct way to launch WDINDEX:
<network path to Worldox>\wdmirror.exe /wdindex
After you fix the shortcuts (remember to check the Start->Startup shortcuts also!!):
You may have a document in your document repository that is several thousand pages long. While Symphony OCR will patiently process this document, it may take several days to complete the processing preventing other documents from being processed.
You can set a maximum page count for Symphony OCR to process. This will keep Symphony OCR from getting tied up processing these very large documents. To enable this, you can do the following:
If you do opt to make this setting change, please let us know by emailing support@trumpetinc.com. If we have several requests to do this, we may opt to add this to the User Interface.
Symphony OCR is in either a warning or error state with the error message indicating that ABBYY is not installed.
Occasionally, as a part of a normal update, the OCR engine will need to be updated. We've seen very rare cases where this engine will not automatically install. This error is a result of the OCR engine not being installed properly.
Manually install the OCR engine. To do this, download "www.trumpetinc.com/getresource/symphonyenginesetup and save it to the desktop. Run this engine installer, and install to the default directory:
After this, re-run the normal Symphony OCR installer and launch.
When processing any MSG file, an error is thrown stating one of the following:
Unable to process msg request [0xfffffffd]
or
Outlook has not been launched one time on this machine or the Outlook installation needs to be Repaired. Error [0xfffffffd]
This error is ultimately caused by Symphony being unable to work with the Outlook MAPI sub-system. We have seen a few causes of this:
1. The Outlook installation is corrupted (in which case, use Programs & Features to do a Repair on the Microsoft Office installation)
2. Outlook is in the middle of an automated update and is waiting for the machine to be rebooted (in which case, reboot the machine and see if that takes care of things)
3. Outlook hasn't been launched and set up on the machine (you don't have to set up an actual mail account - but you do need to go through the Outlook Welcome Wizard)
4. There are multiple instances of Outlook installed on the same machine (e.g. Outlook 2010 and Outlook 365 on the same machine), MAPI is configured to use one of the versions, but the user is launching and using a different version (in which case either run a Repair on the install that the user does use, or launch the other Outlook install one time and go through the Outlook Welcome Wizard)
NOTE: SOCR is not compatible with Outlook 64bit. To resolve, install Outlook 32bit and launch one time.
Unable to open Symphony OCR - behavior is like a bad URL:
First of all, if Symphony OCR is installed as a service, please make sure the Symphony OCR service is running. Open up Windows Services to check for the service and make sure it is running.
Otherwise, we've seen some sites that don't handle the fully-qualified domain names properly. Consequently, the 'Machine.Domain.local' URL fails. To confirm this, change the URL to omit the ".Domain.local" portion.
For example, my default URL may be: Indexer.Trumpet.local:14722/maestro/do/showWelcomeScreen
If this failed, I would then try: Indexer:14722/maestro/do/showWelcomeScreen
(Omitting the ".Trumpet.local" portion.)
If this works, then the fully-qualified domain name is causing the problem.
Alternatively, if even that is not working, you could try inputting the machine's IP address:
192.178.1.###:14722/maestro/do/showWelcomeScreen
And if that continues to fail then you may have an issue with rerouting entirely, at which point you can input "localhost":
localhost:14722/maestro/do/showWelcomeScreen
*Please note that with this solution (using localhost), the Symphony OCR interface will NOT be able to open from another machine.
Once you're able to input a working URL, you may want to adjust the settings file so that it uses the new URL everytime, instead of the default. To do this, the default URL can by modified by editing the settings config file. These are the steps:
If you have NetDocument's Echoing feature turned on, a Checked out document is downloaded to the Echo folder to improve performance and redundancy. When you open that document from NetDocument's interface, it scans the Echo folder to determine it's there and opens a copy of the document in the Echo folder.
If your document has been OCR'd by Symphony OCR that copy will be in the document repository, but your local copy may not have the OCR results in that Echo Folder. If other users can see the OCR results within the file, but you cannot, it might be because you are utilizing the Echoing feature.
NetDocument's knowledge book article here: How does the Check In List and Echo Folder Work describes how to disable Echoing and how to Reset the Check In List which may be useful resolutions for you.
Symphony OCR does not automatically redirect to the home page upon launching, and user is required to click the "Click Here" button to proceed.
Control Panel > Internet Options > Security > Internet > Custom Level > Scripting > Enable
Control Panel > Internet Options > Security > Trusted Sites > Sites > Enter current website for Symphony OCR > Add
Symphony displays the following error:
Unable to send emails - Communication error - sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
This caused by a change in new Java libraries.
Update Java:
Stop Symphony (or stop service if configured to run as a service)
Go to Control Panel and uninstall all instances of Java
Go to Java.com and select 'Download' in the main title bar:
Select 'See all Java downloads':
Select 'Windows Offline':
Install that version and then re-launch Symphony (or start the service if configured to run as a service)
Symphony OCR crashes unexpectedly with the user interface showing java errors. If you open the "C:\Program Files (x86)\Trumpet\SymphonyOCR\logs\maestro.log" log file, you'll notice an entry that looks like this:
2020-06-01 06:43:15,637 [Analyzer-2] ERROR com.trumpetinc.maestro.processormanager.ProcessorThreadPoolExecutor - Processing task failed with an error: Java heap space
java.lang.OutOfMemoryError: Java heap space
Symphony OCR requires a substantial amount of memory to be available for analyzing and processing documents. Some documents require more memory than what Symphony OCR is able to allocate to analyzing and processing the document.
If you are unable to find the offending file(s), feel free to reach out to us at support@trumpetinc.com and we'll be happy to help.
If you do find an offending file is causing the problem, we'd love to get a representative sample to investigate in detail. If possible, you can upload the file here: https://extranet.trumpetinc.com/upload/clientdata
IMPORTANT: Under no circumstances should you email sensitive data to us. This upload site is secure; email is not a secure transport medium.
Handwritten documents are upside down after having been processed by Symphony OCR.
Symphony OCR has the setting to Automatically Rotate documents turned on by default. It makes every effort to determine the proper orientation for handwritten documents, however, handwriting makes the page orientation often hard to determine.
Unfortunately, this algorithm is not something that can be tuned, so you have two options:
Needs Attention: Documents in the 'Needs Attention' list are those that appear to be eligible for OCR, but encountered problems during processing. Files in this list could be corrupted or contain invalid images (try opening them in an image viewer to be sure), or they may be images that Symphony OCR does not handle yet.
Occasionally, a document can fall into the 'Needs Attention' list because of bad timing - Symphony OCR trying to process the document when it isn't fully available. So we always recommend clicking the "Show Bulk Operations" button and then "Re-Analyze All", just to ensure this isn't the case.
If the document is corrupted, you can either remove the document from Worldox, or manually tell Symphony OCR to "ignore" it, which will put it on the 'Ignored' list. If the 'Needs Attention' list contains any documents, the overall system condition will show as "Warn." Ignoring a document that you have already checked is a good way to change the system condition back to "OK".
If the document does not appear corrupted, the next step would be to allow us to see a copy of the file. Because PDFs can be generated in countless different ways, we occasionally run into a specific sub-type of PDF that we've not encountered before. If we can get a copy of the file that is falling into the 'Needs Attention' list, we can in almost all cases, add support for the file. Please contact us at support@trumpetinc.com for instructions to upload documents to our secure site.
New: Documents in the New list are those that have be found by the finder tool, but not yet allocated to another document list (documents are only in the New state for a very short period of time).
Not Safe: Documents in the 'Not Safe' list are those that appear to be eligible for OCR, but due to the PDF library that created those documents, they are not safe to process. If Symphony OCR processes those documents, they become corrupted.
Too Old: Documents in the Too Old list are those that have a file modified date older than the cut off age defined the Processor configuration.
Inaccessible: Documents in the Inaccessible list are those that could not be processed because of file system security, Worldox security, read-only attributes or other conditions that prevent the document from being accessed and worked on. In addition, if the profile group in which the documents reside contains an invalid base path (containing a space for example), or if the file has a space immediately prior to the document extension, they will be shown in the inaccessible list
Corrupted Documents - Documents in the corrupted list are those that Symphony OCR does not recognize as valid files. The most common reason is that the file is an invalid or corrupted PDF (try opening in Adobe to be sure). Another possibility is that there is some characteristic of the PDF that the Symphony OCR parsing algorithm isn't handling properly. Trumpet does periodically update the PDF parsing algorithms to address corner cases that have not been encountered before.
What to do?
Try opening the file in Acrobat, then hit Save (Acrobat will try to open and auto-repair corrupted files - when you save the document, it will save uncorrupted). After saving and closing the document, click the Re-Analyze button on the document record in Symphony OCR. This will only work if the file is only lightly corrupted, but is worth a shot.
If that doesn't help, next check to see if the file is already text searchable (i.e. can you search for text inside the PDF already?). If you can, then the document isn't a candidate for OCR anyway, and you can just move the document to the Ignore list.
If the document does need to be OCRed, and the Adobe repair doesn't help, then you may want to submit the document to us for analysis. Open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload the document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch.
If there are a large number of files that have the same corruption reason, and the files don't appear to actually be corrupted, please open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload a sample document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch. Alternatively, you can use a bulk Ignore operation to move the documents to Ignore.
Encrypted / Restricted: Documents in the Encrypted/Restricted list are those that are restricted from being processed because of some characteristic of the file itself (for example, an encrypted or partially restricted PDF file will not be processed).
Ignored: Documents in the Ignored list are documents that a Symphony OCR administrator has explicitly told Symphony OCR not to process. Any document on this list was explicitly placed there by human intervention.
Wrong Type: Documents in the Wrong Type lists are a tif documents and TIFF processing is not enabled.
Moved / Unavailable: Documents in the Moved / Unavailable list are no longer available in the Document Management System (DMS). This could mean that the DMS has gone "offline" or the DMS settings have been adjusted so that the documents would not have been found for processing (e.g., if a user selects a profile group to analyze and OCR, and then chooses to un-check that profile group or no longer process it). Document records in the Moved/Unavailable list will be deleted from the database after 15 days. Documents can also appear in the Moved / Unavailable list if they are no longer at that current location.
Digitally Signed: Documents that are digitally signed will not be processed by Symphony OCR because adding OCR information to these documents would invalidate the digital signature. If you wish to have these documents OCRed anyway (and are OK with invalidating the digital signature), please send an email to support@trumpetinc.com and request that functionality be added.
Too Big (to 8.0.0 and higher)
If a document falls into this list, it does NOT mean the document is contains too many pages. Symphony OCR processes files one page at a time. So if a document falls into this list, it means the document contains one or more pages with pixel dimensions larger than a specified value. In this version of Symphony OCR that value is 32512 x 32512 pixels.
This is a hard limit and cannot be overwritten.
Too Big (Prior to 8.0.0)
If a document falls into this list, it does NOT mean the document is too big. Symphony OCR processes files one page at a time. So if a document falls into this list, it means the document contains one or more pages with pixel dimensions larger than a specified value (ie. The page couldn't be loaded into memory). We usually see this in documents like blueprints of schematic drawings. But there are some things we can do to try to get these types of documents processed, if you find that it needs to be processed.
Clicking on the document in the 'Too Big' list will tell you the size of the offending page.
1) Click on the 'Too big' list.
2) Click on the individual document in question.
3) The offending size of the document is available in the document details.
If you find you have a series of the same type of documents, it's usually the case where the same size file is exceeding the limit. You can attempt to process these documents by modifying the value(s) declared in the setting.xml file. (Defaults differ depending on the version you're running.
Unprocessed Email: This number indicates the number of email messages (.msg files) found in your repository. Of those .msg files, there may be attachments that would benefit from being OCR'd. The number you see under "Not Processed" does not indicate the number of eligible .msg attachments because those documents have not yet been analyzed.
Symphony OCR has a setting that allows you to analyze those .msg files and OCR any eligible attachments. To enable that setting review this article: Enable the Processing of Email Attachments.
Note: 32bit Outlook will need to be installed and launched once on the workstation that SOCR is installed to. If you have Worldox, you will also want to ensure that Text Indexing of Email Attachments is enabled.
For the most part documents that do not contain an image or text don't need to be processed. One example of this might be your company's PDF letterhead template. By default these PDF documents will not be processed and will be placed in the "No Image or Text" Document List. How can a PDF have no image and no text?
If you would like Symphony OCR to process a specific document (or set of documents), you may force processing. Here's how:
You can confirm this setting has been applied by viewing the Analyzer and Processor pages:
License
This is where your Symphony OCR license is set. To change your Symphony OCR license, simply click "Licensing" from the Configuration side bar, enter your new license, and select "Save Changes."
License Details
Provides you details of the license.
Features Allowed by your License
This area tells you which features are allowed by your license.
Updating your License
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your renewal invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will automatically see this new license, download it and install it.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can click the "Check for Updated License" link on this page. This will manually trigger Symphony OCR to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Notifications allow users to be emailed nightly based on the status of Symphony OCR.
Each email address may be configured with one of four types:
Never - nightly emails will never be sent to this recipient (instead, after entering an email address you can select "Send Now" and deliver an email to the recipient on demand).
When there are errors - the nightly email will only be sent to the recipient if the overall system condition is Error. This is useful for recipients who only need to know when the system is not processing documents because of some major error (licensing issues are the most common major error).
When there are warnings or errors - the nightly email will only be sent to the recipient if the overall system condition is Warning or Error. The warning condition is triggered by documents in the Needs Attention list, configuration problems or other system level issues that should be looked at, even though they haven't completely stopped processing from occurring.
Always (aka Daily) - the nightly email will be sent to the recipient every night regardless of system status. This is useful for firms who want to monitor the 'Not Processed' lists to ensure that every document that couldn't be OCRed (e.g., because of security or corruption) has been reviewed. Users can review documents in the various 'Not Processed' lists and either correct the underlying issue, or move the documents to the Ignore list using Bulk Operations >Ignore.
If you have a user leave the firm or you no longer wish for a particular user to be notified, you can change the Notification Type to "Never" or remove the user entirely by selecting "Remove" to the right of the address.
Worldox User Code - This is where the Worldox user is specified. This is the user that Symphony OCR should search for documents as (note that Symphony OCR does not actually use a Worldox license). Symphony OCR will have access to all profile groups that the specified Worldox user has access to. This user should have Worldox Manager Rights. We recommend using the 000000 user.
Worldox Network Folder - This is the network folder in which Worldox is installed. It can be identified as a UNC path or a mapped network drive (e.g. \\server1\DMS\Worldox, or X:\Worldox) unless you are running Symphony OCR as a service, in which case it must be identified as a UNC path.
Profile Groups to Monitor
This is the list of profile groups the user specified has access to. If a profile group does not appear in the list, this user does not have access to those profile groups (or the profile group has not been properly configured in Worldox). You can select the checkbox in the header area to automatically select and process all documents in all profile groups. If you wish to only process certain profile groups, you can simply select the applicable ones. Be sure to select "Save Changes" at the bottom of the screen.
Default Priority - There are 6 processing priorities which range from Very Low to Very High and includes "Analyzer Only". By default all profile groups will be processed with a "Normal" priority. If you wish to change the priority for a particular profile group, select the appropriate item from the drop down arrow. If you wish to re-prioritize documents that have already been found in that particular profile group as well as new documents that are in that profile group, select the "Reprioritize existing documents" checkbox. For more information on Processing Priorities see: Processing Priorities
Refresh - Allows you to refresh the list of available profile groups. For example, if you have added a new profile group to Worldox, and wish to process that group, you can select this which will provide you with the newly added profile groups.
View Detailed Progress - Selecting this will take you to the Progress Details page. This will provide you with a list of profile groups, the number of documents and pages that have been processed / not processed per profile group.
Advanced settings
Process Read Only Files - If you wish to process read-only files, you should check this checkbox.
Indexed Search Frequency - By default Symphony OCR will search for documents in selected profile groups once every 15 minutes using Indexed Searches. This should be sufficient for your needs, however you can change this to search more or less frequently.
Non-indexed Search Frequency - By default Symphony OCR will search for documents in selected profile groups once every 12 hours without using Worldox indexes. Because it takes a significant amount of time to crawl through the directory structure to find files, once every 12 hours should be sufficient.
Debugging
Reset Worldox Session - Selecting "Reset Worldox Session" will reset the Worldox session for the user defined in the Basic Settings above.
Database login credentials
Login to database with username: Enter the username for the database
Login to database with password: Enter the database password
Database computer name: Enter the database computer name
Database server instance name: this is optional and required only if there is more than one database on the server, if there is more than one database on the server, enter the instance name you wish to process
Database name: enter the name of the database
Advanced settings:
New document search frequency: By default Symphony OCR will search for documents once every 15 minutes. This should be sufficient for your needs, however you can change this to search more or less frequently
Legacy document search frequency: By default, Symphony OCR will perform a search for legacy documents (documents existing prior to installing Symphony OCR) every 7 days.
Basic settings
PracticeMaster network folder/current working directory this is where the Practice Master network folder is identified. Copy and paste the path to the network folder into the field.
Documents folder this is the root of where the documents reside within Practice Master. Copy and paste the path to the folder into the field.
Advanced settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Finder Scan Frequency - by default Symphony OCR will search for documents once every 120 minutes. This should be sufficient for your needs, however you can change this to search more or less frequently
Connect to Sharefile
Basic Settings
ShareFile Account - Displays the user that Symphony OCR connects to ShareFile as
Folders to Monitor
This is the list of Folders the user specified has access to. If a folder does not appear in the list, this user does not have access to those folders.
View detailed progress - Selecting this link will take you to the Progress Details page. This will provide you with a list of Cabinets, the number of documents and pages that have been processed / not processed per cabinet.
Advanced Settings
Search frequency - By default, Symphony OCR will perform a search for new documents every 60 minutes. The value on the may be adjusted if you require searching for documents less frequently.
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Folders to Monitor
This is the list of folders that Symphony OCR is monitoring.
Search Frequency - The frequency in which the Finder will query this directory tree for new pdf & tif documents.
Default Priority - The priority level in which this directory will be processed. For more information on setting document priorities see: Processing Priorities
Add a folder
To add a folder or directory tree to the list of folders that should be monitored by Symphony OCR, add the path to the field and select "Add". Symphony OCR will process the entire directory tree of the path you provide. (e.g. X:\Clients will process all documents in the subfolders beneath X:\Clients, like X:\Clients\Anderson, Matthew and X:\Clients\Anderson, Matthew\Agreements, then select the Add button on the right. This will add the directory tree to the list of folders that Symphony OCR is monitoring.
Note: If you wish to process files in a hidden folder, you must explicitly indicate that folder. For example, if you have a root folder like X:\Clients and under that a hidden folder called "Inactive" (e.g. X:\Client\Inactive), you must explicitly add that folder to the Monitored folders.
Advanced Settings
Process Read Only Files - if you wish to process read-only files, you should check this check box
Connect to SharePoint
For a quick video showing the installation and configuration of SharePoint visit: https://youtu.be/UNGbJiaRn9A
The Scheduler determines when and how frequently Symphony OCR performs specific tasks, such as when to send a heartbeat, when to search for new documents, when to purge backup files, etc.
To adjust a setting select "Edit" to the left of the specific setting you would like to adjust.
To delete a specific Scheduler entry, select "Delete" on the right of the particular setting.
Most users will not require changing these items, however there are special cases when you may wish to do this. For example, if the firm runs their indexer software and Symphony OCR on a user's workstation, you may wish to only process items overnight.
The Finder is responsible for locating the documents in the document repository.
Control
In the control area, you can choose to refresh the Finder or stop the Finder:
Refresh - Selecting Refresh will refresh the status of the Finder.
Stop Finder - Selecting this option will stop the Finder from finding documents in the document repository.
Status
There can be multiple tasks in the status depending on the firm's licensing and applicable document repository:
Worldox
Status
Worldox Indexed Search performs an indexed search to find documents that have been created or modified *today* that are eligible for OCR. By default, it performs the query every 15 minutes. This can be adjusted by selecting "Manage". This will take you to the Worldox page where you can adjust the search frequency under Advanced Settings.
Worldox Non-Indexed Finder performs a non-indexed search to find all documents in Worldox that are eligible for OCR, regardless of how recently the document has been created or modified. By default, it performs this search once every 12 hours. This can be adjusted by selecting "Manage". This will take you to the Worldox page where you can adjust the search frequency under Advanced Settings.
NetDocuments
Status
NetDocuments Recent Documents Search - performs a search to find documents that have been created or modified *today* that are eligible for OCR. By default it performs the query every 15 minutes. This can be adjusted by selecting "Manage". This will take you to the NetDocuments page where you can adjust the search frequency under Advanced Settings.
NetDocuments Legacy Documents Search - performs a search to find legacy documents that are eligible for OCR. By default it performs the query every 7 days. This can be adjusted by selecting "Manage". This will take you to the NetDocuments page where you can adjust the search frequency under Advanced Settings.
Folders
Status
Folder Search - performs a search in the monitored folder structure to find all documents that are eligible for OCR regardless of how recently the document has been created or modified. By default it performs this search once every 120 minutes. This can be adjusted by selecting "Manage". This will take you to the Folders page where you can adjust the search frequency for each folder.
The Analyzer is responsible for looking at each document and determining if it is eligible for OCR. If a document is eligible it is placed in the Processing list. If a document is not eligible, it is placed in the appropriate list (for more information on why a document might not be eligible for OCR, refer to the section, Not Processed List).
Control
In the control area, you can choose to refresh the Analyzer or stop the Analyzer:
Refresh - Selecting Refresh will refresh the Status of the Analyzer page.
Stop Analyzer - Selecting this option will stop the Analyzer from Analyzing documents in the document repository.
Status
Displays the status of the Analyzer.
Information
Machine Processors - Indicates how many logical processors the workstation running Symphony OCR contains.
Licensed parallel processing - Indicates how many documents will be analyzed at a time based on your license features.
Recent Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Overall Performance (since last restart)
Provides performance statistics such as the total number of documents and pages that Symphony OCR has found eligible for OCR, and the average speed of analysis per document since the last restart of Symphony OCR.
Settings
Do not analyze documents younger than - The default setting is 30 seconds. If you wish to have the Analyzer wait longer to analyze documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
To change this setting, simply type in the number of seconds, and then select "Save Changes".
Select Processor in the navigation panel:
The Processor manages the actual OCR processes. Once a document has been identified as eligible for OCR by the Analyzer, the Processor confirms that the file is still eligible for OCR, and then OCRs the file. If a document is successfully OCRed, it is moved to the Processed list (for more information about the flow of documents throughout Symphony OCR, refer to the section Symphony Workflow, Tools & Document Lists).
Control
In the control area, you can choose to refresh the Processor or stop the Processor:
Refresh - Selecting Refresh will refresh the status of the Processor page.
Stop Processor - Selecting this option will stop the Processor from processing documents in the document repository.
Status
The status of the Processor (what it is currently processing).
Information
Processing Capacity Remaining - If you have a license that limits the number of pages you can process per year, the number of pages remaining will appear here.
Machine Processors - Indicates how many logical processors the workstation running SymphonyOCR contains.
Licensed parallel processing - Indicates the number of documents that will be processed by the processor simultaneously.
Recent Performance
Provides performance statistics such as the number of documents and pages that Symphony OCR has processed in a smaller sample size and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Overall Performance
Provides performance statistics such as the total number of documents and pages that Symphony OCR has processed and the average speed of processing per page, the average number of pages per document and the effective throughput of the documents.
Basic Settings
Process TIFFs (OCR and convert to PDF) - Symphony OCR can process TIFF files and convert them to image + text PDF files. This is an optional setting. If you wish to process TIFF documents, simply check this checkbox.
Note: If the firm opts to process TIFF documents, this will change the file extension to .tif. This will "break" any relationships or projects that include this file.
Process MSG (email) attachments - Symphony OCR can process email message attachments. This is an optional setting. If you wish to process email message attachments, check this checkbox.
<Big fat scary warning:
Due to a limitation in newer versions of Office, Microsoft prevents us from accessing the DLLs that allow us to read/process emails under the following conditions:
> Symphony OCR is configured to run as a service
> 'Process MSG (email) attachments' is checked
> Outlook 2013 (or possibly Outlook 2016) is open
In these circumstances, you're likely to see the following error:
Therefore, if Symphony OCR is being installed to run as a service *and* will be configured to process email attachments, it is our recommendation to install it on a machine that will not normally have Outlook 2013 (or possibly 2016) open. On the bright side, our testing has shown that in these situations, Symphony is still processing normal documents and WILL eventually recover and process emails after Office is closed. But if you can, we recommend avoiding this situation. If your experience is different, we'd like to hear about it.
End of big fat scary warning>
Do not process documents younger than - The default setting is 30 seconds. If you wish to have the Processor wait longer to process documents, simply change the value in the field. Trumpet recommends that this value is not decreased to less than 30 seconds to ensure that documents are fully written to the disk before processing.
Do not process documents older than - If you have older documents that you do not want Symphony OCR to process, enter a specific number of days for which the software should process backlog.
Automatically rotate pages to proper orientation - If selected, the pages will rotate either landscape or portrait according to the text on the page.
Original retention settings
Retain originals of processed files - If selected, Symphony OCR retains copies of the documents that is has processed. These copies appear as versions (if Symphony OCR processes a document 3 times, it will maintain copies of all 3 versions of the document). The user can restore previous versions of a document from the Symphony OCR backup using the document Details screen.
Purge originals of processed files after - The default setting is to retain the originals of processed files for 7 days after which they will be purged. If you wish to change this setting, you can change the value to the appropriate number of days for your firm.
Backlog throttling settings (only needed when your license does NOT have unlimited pages for processing)
Default processing capacity reserved for new documents (based on the actual number of new pages added each day) - This is calculated from the number of pages that were added to the site in the past year.
Override the default processing capacity reserve - This will determine the number of pages you would like to reserve for new documents, evenly spreading the page count capacity across the entire year. To determine a reasonable reserve, allow the Symphony OCR Analyzer module to run, then look at the timeline for the Processing Queue. Adding the number of pages in the first 52 weeks, and dividing by 365 will give an average number of pages added to the system per year. Trumpet recommends adding an additional 10% to accommodate for future growth or above average filing. This value should be a reasonable overclocking reserve.
Advanced Settings:
Enable OCR debug logging - This will enable debugging for support purposes.
Create thumbnails (if not already present) - Checking this checkbox will create thumbnails if they are not already present.
Enable OCR debug logging - This will help our support team address issues if necessary. In order to reserve disk space, we recommend not enabling this unless requested by our support team.
Limit parallel processing to X documents- This allows you to limit the number of cores that Symphony OCR will utilize. It uses 1 core per document. For example, if you input 3 Symphony OCR will only use 3 cores, and will process 3 documents simultaneously. See: How does Symphony OCR impact the performance of the server or indexer PC
Symphony OCR is administered via a web browser interface. If Windows Firewall (or some other software firewall) prevents Symphony OCR from accepting inbound connections from the web browser, administration will not be possible.
Here's how to configure Windows Firewall to allow Symphony OCR to accept inbound connections:
Trumpet products are versioned with a 3 digit number (e.g. 2.7.3, 2.18.4). All versions of the product are released serially (i.e. version 2.7.3 contains all of the changes from 2.7.2, 2.7.1, 2.6.18, 2.6.17, etc…). New versions are created frequently (often once or twice per week), and each version change consists of a very small amount of changed or additional functionality (i.e. one bug fix or one new feature).
It is not at all unusual for Trumpet to produce 2 or 3 versions of a given product in a single week.
Trumpet also has reporting on which customers have which versions – and whether those installations are in an OK, WARN, or ERROR status level. We also track support requests (issues) by which version of the software was installed at the time of the request. This allows us to make quantitative assessment of the risk of a given version of each product. If a given version is in use at many sites, all of which are OK, and there have been no reported issues for that version, then we can say with confidence that the build is stable and safe to deploy broadly.
Because each version bump incorporates a very small number of changes, it is very, very easy to identify any regression issues that arise.
Trumpet has 3 phases that a given software version might go through:
DevRelease |
Software is highly unstable. No testing has been performed. May contain known and unknown huge glaring bugs and problems. This is not available to download through your software, and should not (and could not) be deployed to any system unless Development is involved |
PreRelease |
The software is considered stable, and has passed internal QA, but we don’t have exhaustive experience at tons of sites – it may still contain unknown bugs, but regressions are highly unlikely |
Production Release |
Software has been proven to be robust at a large number of sites – any bugs remaining are small. |
This means that the latest Production release will always be at the same or less version as the latest PreRelease. For example, 2.3.7 might be the latest Production release, and 2.3.23 might be the latest PreRelease. The PreRelease would contain 16 small changes since the Production release was made.
Periodically (usually around every 3 months), a review determines the latest PreRelease that is considered to be ready for Production. This review consists of looking at how many sites are running the PreRelease, whether there have been any support requests made for versions between the current Production release and the PreRelease that may indicate an issue with the underlying code, and whether sites currently using the PreRelease are in a warning or error state.
Installers for DevRelease are named ‘DevRelease-ProductName-x.y.z.exe’. Installers for PreRelease are named ‘PreRelease-ProductName-x.y.z.exe’. Installers for Production are named without a prefix (‘ProductName-x.y.z.exe’).
Once a PreRelease version has been declared ready for production, a Production release is created with the same version number as the PreRelease (this is the exact same installer – we literally just rename the installer exe).
Once a Production version is identified, we generally bump the second number of the version for the next PreRelease we create. For example, if a 2.3.16 PreRelease is marked as a Production release, the Production release will be 2.3.16, and the next change we make to the product will be under version 2.4.1.
The reason we have these phases is to minimize the risk of exposing a given problem to a large number of users. PreReleases tend to roll out gradually to a small handful of sites as we work with firms who actually need functionality, or to those sites who choose to install it pro-actively. PreRelease and Production releases could be installed by anyone at any time by doing a Help->Check for Updates (or we may send a blast email announcing a new version’s availability).
This approach results in customers being able to install the latest ‘Known Good’ version on a regular basis (3 or 4 times per year), while still enabling customers who need changes or fixes to get rapid updates at extremely low risk.
Pre-Release versions are very stable. If we are fixing a bug or adding new functionality, it is extremely rare that work could cause problems for the useful functionality of earlier versions (we refer to this sort of issue as a ’regression’, and our release management cycle is designed to prevent this sort of issue). So there is a small chance that a given PreRelease might not completely fix the bug it was intended to fix – or there may be a subtle issue with new functionality that was added - but it is very rare that a given PreRelease would actually break the application in a meaningful way.
If you have an issue or need that the latest PreRelease fixes, it is generally a good idea to update, unless the issue is truly not important to your organization.
PreReleases (if available) can be obtained using the Check for Updates functionality available in all of Trumpet’s applications.
Production versions are not only very stable, but have had a good number of sites using the version without issue.
We recommend that you install all Production updates as they become available (though it’s perfectly fine to schedule this into your regular maintenance schedule).
Updating your Symphony OCR license / software is a two step process, the first, is to update your Symphony OCR software, and the second is to update your Symphony OCR license. The following are instructions for doing each of these operations:
These steps assume that you have received an email instructing you to update your Symphony installation. Depending on the update notification, that email may contain your client code and/or license number.
That's all there is to it!
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your yearly invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will see this new license, download and install.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can perform the following steps:
As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
These steps assume that you have received an email instructing you to update your Symphony installation. Depending on the update notification, that email may contain your client code and/or license number.
That's all there is to it!
Updating your Symphony Suite license / software is a four step process:
The following are instructions for doing each of these operations:
Updating your Symphony Profiler license / software is a two step process, the first, is to update your Symphony Profiler software, and the second is to update your Symphony Profiler license. The following are instructions for doing each of these operations:
These steps assume that you have received an email instructing you to update your Symphony installation. Depending on the update notification, that email may contain your client code and/or license number.
That's all there is to it!
Note for if users have the local workstation component installed on their computers (the component is not technically required, but many still have it and prefer it): Once the back-end is updated, the workstations will receive an update notification the next time Symphony Profiler Workstation is launched (normally when users log in) - to get the Workstation update sooner, you can close Symphony Profiler Workstation and re-launch it, then follow the update prompts.
Starting with version 1.7.28, Symphony Profiler will have an 'Automatic License Update' feature. Basically, after you've paid your yearly invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony Profiler will see this new license, download and install.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can perform the following steps:
This will manually trigger Symphony Profiler to retrieve the updated license from Trumpet's servers. As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Updating your Symphony OCR license / software is a two step process, the first, is to update your Symphony OCR software, and the second is to update your Symphony OCR license. The following are instructions for doing each of these operations:
These steps assume that you have received an email instructing you to update your Symphony installation. Depending on the update notification, that email may contain your client code and/or license number.
That's all there is to it!
Starting with version 6.4.96, Symphony OCR will have an 'Automatic License Update' feature. Basically, after you've paid your yearly invoice with Trumpet, a new license is automatically generated. So if your installation has access to the Trumpet servers, Symphony OCR will see this new license, download and install.
Note: Symphony will check for a new license once every 3 days under normal circumstances, and once per day when your license is within 30 days of expiring.
If you've paid your invoice (and received notification of a new license) and don't want to wait for the automatic update to kick in, you can perform the following steps:
As mentioned, all of this assumes your installation has access to Trumpet's servers. If a connection cannot be established, you can always copy/paste your new license into this screen.
When you receive notification from Trumpet that your new license is generated, it is still highly recommended that you A) update your installation to the latest version of the software, and B) verify your license has been updated.
Symphony Suite consists of two software components that need to be updated individually. Here are instructions for each:
Because Symphony Suite Cloud (including both Symphony OCR and Symphony Profiler Cloud products) is updated on the cloud servers, no software updates are required. In addition, the Symphony Suite products' licensing is automatically pushed to the cloud servers, therefore, the license you received can be saved for your records and no action is required on your behalf.
Summary
- Updated to the latest version of the 64-bit version of the Abby FRE Engine
- Symphony OCR versions 8.0 and higher require a 64-bit Operating System
- Documents that are larger than 32512x32512 pixels will be moved to the Too Big list, these will not be processed regardless of the checkForLargePages setting
- Updated SharePoint integration
- Resolved issues with null pointer exceptions
- Out of an abundance of caution, Symphony OCR was updated to ensure there are no log4j dependencies that could be vulnerable to log4shell
To see a complete list of changes, visit: Change Log
8.1.67
- No functional changes
8.1.66
- Explicitly disable the following SSL algorithms:
SSLv3, TLSv1, TLSv1.1, RC4, DES, MD5withRSA, DH keySize < 1024, EC keySize < 224, 3DES_EDE_CBC, K_NULL, C_NULL, M_NULL, DHE_DSS_EXPORT, DHE_RSA_EXPORT, DH_anon_EXPORT, DH_DSS_EXPORT, DH_RSA_EXPORT, RSA_EXPORT, DH_anon, ECDH_anon, RC4_128, RC4_40, DES_CBC, DES40_CBC, DESede, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
8.1.64
- Fix for error 0xc0000135 on some Windows 11 machines
8.1.61 -
- Add temp file delete retry loop during storing of some .dat files (workaround for antivirus holding locks on files when it shouldn't be)
8.1.57 -
- Improved performance in how we copy files during database maintenance
8.1.55 -
- Bug fix - Worldox Index Finder not working for sites with Locales that use a date format other than m/dd/yyyy. Symphony will now use the machine locale to determine the date format used in Worldox index search query strings.
8.1.54 -
- Java install download will now use https://resources.trumpetinc.com instead of software.trumpetinc.com
8.1.53 -
- Added futureModifiedDateDateCutoff config setting (default is 24 hours in the future - 24L*60L*60L*1000L) - any document with modified date in the future by more than this cutoff will be eligbile for immediate processing
8.1.52 -
- Added support for specifying folder locations in an optional folders.properties file (in the root folder of the application)
- appHome
- work
8.1.50 -
- Added File modified time to document detail display
8.1.48 -
- License update checks now go to https://lms.trumpetinc.com instead of https://partners.trumpetinc.com
8.1.46 -
- Bug fix - some TIFFs could result in mis-sized PDF files after OCR
8.1.44 -
- Change OAuth bounce server from https://extranet.trumpetinc.com to https://oauth.trumpetinc.com
- Change notifications URL to host https://notifications.trumpetinc.com
8.1.42 -
- Improved handling when low level database failures happen (auto rebuild on restart)
8.1.39 -
- Bug fix - Status screen would list "XXXX integration has 0 warnings" when there actually was a warning
8.1.38 -
- Bug fix - if SharePoint connection dropped mid-run, the Connect button didn't show in the SharePoint config screen (user was forced to restart SOCR to get the Connect button in this scenario)
8.1.37 -
- Bug fix - Welcome wizard was showing even after the license was entered
8.1.36 -
- Bug Fix - "RPC Server" error messages during OCR on some Azure servers
8.1.34 -
- Change URL used for sending notification emails to use https://extranet.trumpetinc.com instead of https://webservices.symphonysuite.com
8.1.33 -
- Bug fix - Application crash with Heap Space error when handling certain types of pages with huge rendered content streams
8.1.32 -
- SOCR will now use the Windows SSL trust store instead of requiring private certificate registration
8.1.31 -
- Issue fix - SOCR consumes excessive disk space (many logs/*.hprof files) when out of memory errors result in service restarting frequently
- We now purge all but the latest logs/*.hprof files when we startup
8.1.29 -
- Improvements to recovery when the application crashes due to a memory problem. Documents that were actively being analyzed are now moved to a special "Memory Error Suspect" list when the application restarts, instead of attempting to analyze them again.
- Documents that are actively being analyzed are now tracked in a new 'Analysis In Process' list instead of the regular 'In Process' list
8.1.27 -
- Added localhostOnly setting to <webServerConfiguration .... /> config - when true, the user interface will only be visible via 'localhost' or '127.0.0.1' URLs.
8.1.26 -
- Switch heartbeats to send over HTTPS instead of HTTP
8.1.25 -
- Improvement in handling connection timeout issues with Sharepoint (documents will now be marked for reprocessing instead of going to the error list)
8.1.23 -
- Include a few more details in the overall system status (if there is only a single problem, surface it at the top level, and in the heartbeats)
8.1.20 -
- Added File Size and Page Count to CSV list export
8.1.17 -
- Improve status messages during backup purge (we now give a count of the files that have been purged)
8.1.16 -
- Backup purge algorithm has been changed to purge based on the backup file's creation date instead of the database document last modified date
8.1.13 -
- Reduce database contention and improve performance during backup purge (we no longer grab database mutator locks unless the document actually has backups to purge)
8.1.11 -
- NetDocuments Finder enhancement - if there is a failure during finding, the Finder now enters a 60 second recovery loop before continuing with the next cabinet
- NetDocuments Finder enhancement - if the legacy finder had a failure, the error message from that failure stays in the Issues list until the legacy finder completes all cabinets. For sites with really big legacy backlogs, this can result in the error appearing in the finder Issues list for a long time, even though the Finder is continuing to run.
20220128
8.1.10 -
- Bug fix - log output wasn't being written to files (regression introduced in 8.1.9)
8.1.9 -
- Update to ensure that there are no log4j dependencies that could be vulnerable to log4shell (abundance of caution, SOCR isn't public facing)
8.1.8 -
- Bug fix - NullPointerException in SharePoint funding under some situations
- SharePoint finders now report on throughput (docs/hr) while they are finding documents
- Add support for throtting SharePoint searches to a maximum documents per hour. When throttling, the Finder status will give a message that throttling is happening)
8.1.7 -
- Enhancement: Automatically create document record for NetDocuments documents when the user searches for them (i.e. 'Lookup Document', then pasting the ND doc id)
20211217
8.1.6 -
- Adjust 'Reanalyze Not Processed' task so it does NOT do anything with INACCESSIBLE documents
8.1.4 -
- Better error message if Worldox indexed search fails because of too many index hits (error message will now include the text: "Search xxxxxxxxxxxxxx resulted in more results than WDAPI can return. This limit is set by worldox.ini > [Debug] > FindSilentMax=xxxx where xxxx is the limit" )
8.1.3 -
- Bug fix - 'Adding Text to Image Failed - Null' error when processing some PDFs (this is a different issue from what was fixed in 8.1.2)
8.1.2 -
- Bug fix - 'Adding text to image failed - null' error on some malformed PDF documents
20211001
8.1.0 -
- Notification emails in cloud installs no longer include hyperlinks
- Document Details screen now has a Orientation Errors Detected value (to help with diagnosing the regression bug described in the next item) - this is only visible after orientation detection is needed (i.e. document processed by 8.0.0 -> 8.0.11). It will display Yes, No or Unknown (unknown means that orientation analysis still needs to be done - these had better be in the To Analyze queue!)
- Regression bug fix - introduced in 8.0.0 - Pages that required re-orientation prior to OCR did not get re-oriented, resulting in junk OCR results. This build automatically detects these problem documents, rolls them back to pre-ocr state, and re-OCRs them with auto-orientation. No user interaction is required for this to work - users may see a sudden large increase in the number of documents being analyzed, then OCRed
8.0.11 -
- NetDocuments legacy finder can now restart if it is interupted (this should address situations where backlog is very large and networking on ND errors cause the legacy search to fail at some point)
8.0.10 -
- Improvements to MICR processing (some MICR text wasn't being extracted in some documents)
8.0.9 -
- Write enableMicrProcessing to settings.xml even if it is false (makes it a little easier for users to turn MICR processing on)
- Re-enabled support for MICR processing (was disabled as part of the initial 8.0.0 release) - controlled by enableMicrProcessing setting in settings.xml
8.0.8 -
- Re-enable OCR of barcodes (was turned off starting in 8.0.0)
20210331
8.0.5 -
- Bug fix - SharePoint integration was completely broken since Feb 17, 2021 (change on Microsoft's end broke existing integration)
8.0.4 -
- Added explicit check for whether the page is greater than the maximum image size supported by the OCR engine - 32512x32512 - if it is, the document is moved to the Too Big list. This happens regardless of the checkForLargePages setting.
8.0.3 -
- updated SharePoint token so it does not expire
8.0.2 -
- Large page check is now disabled by default. To enable, set <documentPreProcessor checkForLargePages="true" /> in settings.xml
8.0.0 -
OCR engine now requires 64 bit operating system
20201119
7.3.2 -
- Bug fix - large pages were causing SOCR to crash with out of memory errors
- Improve detection of pages that are too large to fit in memory

20200922
7.3.0
- Bump version
7.2.57
- Updated license agreement to refer to Trumpet, LLC instead of Trumpet, Inc.
7.2.56
7.2.55
- Bug fix - Aspose and Nuance files that were already marked as not-safe, continued to be marked as not-safe, even during re-processing
7.2.54
- Bug fix - introduced in 7.2.51 - really big documents were failing with errors about the total content stream being too large. We now limit only the size of a single content stream - not the total across all pages
7.2.53
- Documents in the NotSafe list will be marked for reprocessing when SOCR launches following an install update
7.2.52 -
- If page content requires more than 1M of RAM to extract, we will mark the page as not needing OCR. This will make the file NOT appear on the corrupted list, and will allow SOCR to process other pages that might need OCR.
7.2.51 -
- If page content requires more than 1M of RAM to extract, the document is marked as corrupted with a note that reads ""Page " + page + " can not be read into memory - the page is not really corrupted, but cannot be analyzed because of it's size""
- Enhance Aspose.PDF detection so it is not case sensitive
7.2.50 - dev release
- Documents with producer "Aspose.PDF" are no longer marked as 'not safe' (reversal of change made in 7.2.19) - these documents can now be safely processed by SOCR
- Documents with producer "Nuance PDF Creator" are no longer marked as 'not safe' (reversal of change made in 7.2.29) - these documents can now be safely processed by SOCR
7.2.49
- Add Begin Analysis and Analysis complete entries to document history
- Added document history for documents found in INPROCESS during application startup
- Added file size to documents that are in INPROCESS during application startup
7.2.38 - dev release
- Bug fix - inaccurate 'Document is accessible - can't query for reason' error message
7.2.35 -
- For normal installs, a default launch.ini will be installed (if there isn't one already) that sets the max heap space to 1024m
7.2.34 -
- Update learn more hyperlink for NOT_SAFE lists to point at new kbook article
7.2.33 -
- Make Nuance PDF rollback debug screen skip documents that have unknown document sources (previously, the entire operation failed)
7.2.32 -
- Bug fix - Nuance PDF rollback debug screen (added in 7.2.29) could have NullPointerException if the Symphony database has entries added by really old versions of Symphony
7.2.31 -
- Bug fix - Nuance PDF rollback debug screen (added in 7.2.29) could have NullPointerException if the Symphony database has entries added by really old versions of Symphony
7.2.29 -
- Added special handling for Nuance PDF Creator documents (they are moved to Not Processable for the time being while we work through a bug)
- Added Debug screen operation to roll back Nuance PDF Creator documents that Symphony OCRed during effected versions (7.1.0 through 7.2.29)
20200313
7.2.28 -
- Some SharePoint search results had 'null' file sizes - this was causing Finder to fail. We now handle these oddities gracefully.
7.2.27 -
- More informative error logging if SharePoint API fails
7.2.24 -
- Bug fix - corner case - if MSG record is deleted from the SOCR database, analysis of attachments for that MSG fail with NullPointerException
7.2.23 -
- Bug fix - We now check if a document was checked out by a user during the OCR process. If so, the OCR results are thrown away and the document will be reprocessed instead of saving a new version.
7.2.22 -
- Bug fix - reanalzying MSG files was resulting in error java.lang.IllegalStateException: initialFileModifiedTime can only be set once
7.2.21 -
- Added Debug Screen command ("Search for and roll back non-unity CTM problem documents") to identify and roll back documents impacted by the regression that was fixed in 7.2.20
7.2.20 -
- Regression fix - introduced in 7.1.0 - some source PDFs result in OCR invisible text not being placed properly on the page
7.2.19 -
- Special handling for PDFs with producer string containing "Aspose.PDF" - these files can't be safely processed by SOCR yet - files with this producer are placed on the Not Safe list
- Added Not Safe list
- Udjusted the document details screen so it includes the corrupted or not-safe reason (if one is present)
- Added 'Search for and roll back Aspose problem documents' to the debug screen - when pressed, it will iterate through all documents that were modified by SOCR between 7.1.0 and 7.2.19, check their PDF Producer string. Any documents with producer containing "Aspose.PDF" will be marked for roll-back.
7.2.15 -
- Bug fix - if the underlying DMS fails when copying a read-only copy of a document, the Document was being put directly into the Reprocessing list. It now goes into the Inaccessible list.
7.2.14 - dev release
- Bug fix - null pointer exception in exception handler in getDocumentsByStateReverse()
7.2.13 -
- Bug fix (minor) - MSG attachments weren't capturing initial modified time properly - this resulted in a lot of log chatter when the weekly summary routine was running
7.2.12 -
Bug fix - SharePoint user count was including external users
7.2.11 -
Bug fix - digitally signed documents were not being OCRed when settings.xml documentPreProcessor setting allowDigitallySigned set to true
7.2.10 -
Regression Bug fix (introduced in 7.2.9) - MetaJure feature resulted in Folder feature not working properly. This is now fixed.
7.2.9 -
- MetaJure integration licensing (J feature code) now enables Folder DMS integration
20190923
7.2.9 -
- MetaJure integration licensing (J feature code) now enables Folder DMS integration
7.2.8 -
- Labeling consistency change - Change instances of 'Invisible Words' to read 'Hidden Words'
7.2.7 -
- Remove extraneous apostrophe from Symphony OCR is arleady running dialog
7.2.6 -
- Add additional handling for 0x80030109 responses during MSG editing (STG_E_DOCFILECORRUPT - The doc file has been corrupted) - when this happens during OCR, we now mark the document as corrupted instead of reprocess
7.2.5 -
- Add handling for 0x80030109 responses during MSG editing (STG_E_DOCFILECORRUPT - The doc file has been corrupted)
7.2.4 -
- Added 'System Memory Information' and 'Disk Information' to log output when application launches
7.2.0 -
- Moved to JRE 11 (and Trumpet private Java runtime, non-oracle)
7.1.8 -
- Work around for PDFs with very deep xref versions (tpt86114)
7.1.7 -
- Bug fix - SOCR hangs when interacting with UNC based very long filenames (>260 characters)
7.1.5 -
- Added Needs Attention document count to warning message in UI and heartbeats
7.1.3 -
- Workaround for invalid PDFs that don't have bounding boxes defined for all pages (bounding box defaults to standard portrait letter in these cases) -note that these PDFs are not compliant to the PDF spec (MediaBox is required), so no guarantee that the resulting text placement will be correct - but in testing on the few problem files we've seen has been successful.
7.1.2 - dev release only
- Mark documents with "CCITT codec error" messages as corrupted instead of needs attention
7.0.30 -
- Added corrupt_db.log file (new log4j.properties) that contains only log errors related to corrupted databases (should allow us to narrow down time window of when corruption occurs)
20190403
7.0.29 -
- Bug fix - Worldox API re-initialization could cause Symphony to crash under Worldox WDU14 (race condition exacerbated by WDU14 changes)
20190225
7.0.28 -
- Bug fix - SharePoint files with single quotes in their names resulted in errors
- Bug fix - timeout/429 errors caused some SharePoint integration calls to fail
20190104
7.0.27 -
- Support for firms using proxy servers that use private certificate authorities to proxy HTTPS traffic
- Custom certificate authorities can now be registered in a Java Key Store file stored at /config/cacerts.private - see document 141530 for instructions on obtaining a certificate and loading it into a private trust store
7.0.26 -
- Some corrupted MSG files were being put on the Reprocessing list (error code 0x8004010f now sends the document to Corrupted)
7.0.24 -
- Process Read Only setting in Folder settings wasn't being honored (resulted in 'Access Is Denied' error after OCR completed for documents with read only attribute set)
7.0.22 -
- Added initial support for multiple libraries with OpenText
7.0.19 -
- Bug fix MSG attachments were being moved to the priority of the parent MSG document record when the attachment was reprocessed (if the MSG was set to Analysis Only priority, but the attachment was set to High, if the attachment was re-analyzed for any reason, the priority of the attachment was switched to Analysis Only)
7.0.17 -
- Workaround for non-compliant PDFs that don't properly specify page size for all pages (MEDIABOX missing)
7.0.16 -
- Enhancement - special analysis handling for pages that have invisible text in the margins (i.e. invisible text stamp)
7.0.15 -
- Adjust OpenText/eDocs integration to handle documents that are in-use and documents that have been deleted
7.0.14 -
- Adjusting OpenText/eDocs integration to work properly with live site
20181017
7.0.13 -
- Bug fix - Worldox integration fails at sites that had UNC paths containing spaces and wdcommon\wdmirror.ini files referencing drive letter based CPs
7.0.12 -
- Bug fix - ND OAuth token wasn't being saved for brand new sites
7.0.11 -
- Changes to NetDocuments OAuth token handling to support upcoming NetDocuments OAuth changes
20180706
7.0.10 -
- Bug fix - 'Process Read Only' setting in Folder configuration screen didn't stick
7.0.8 -
- Bug Fix Occassional 'java.lang.ArrayIndexOutOfBoundsException: 100' error when processing under load - could result in 'Database is corrupted' error message in user interface.
7.0.5 -
- Bug fix - uninstaller wasn't being registered for "run as logged in user" installations
6.6.98 -
- Bug fix - Refresh button in NetDocuments settings screen took user to NetDocuments US Vault login screen. The refresh button now just refreshes the cabinet list.
- Added Renew Connection button to NetDocuments setting screen
6.6.97 - dev release
- Bug fix 'New pages added in past year' on summary screen could show incorrect values for up 12 hours when documents are re-analyzed
- Bug fix 'New pages added in past year' label always showed '(this year)' instead of of '(pages/year)' when we had a full year's worth of data available
20180511
6.6.92 -
- Bug fix - NetDocuments made an API change on 4/20/2018 that caused our "Open" links to not take the user to the document in ND
6.6.90 -
- Bug fix - SharePoint sites that had spaces in their name were resulting in "Illegal character in path" Finder errors
6.6.89 -
- Added ability to reprocess documents in the Processing (TOPROCESS) list (just in case they need to be re-analyzed manually)
6.6.88 -
- Statistics screen now displays estimated time to process backlog based on 4 cores (1.2 seconds/page) if there is no OCR performance data to baseline against (i.e. analysis only licenses). The label on the estimate will show "(assuming 4 CPU cores)" in this case.
6.6.87 -
- Bug fix - Fixed OutOfMemoryError under high analysis or OCR load
6.6.85 -
- Bug fix (kinda) - Legacy files in Worldox with ~ at the beginning were appearing in the corrupted list. Worldox indexed searches were sometimes returning files with ~ at the beginning (these are generally temp files that shouldn't have been part of the indexes and certainly shouldn't be OCRed)
6.6.84 -
- Added accessibility message for Worldox integration informing the user that WD versions prior to 20180412 do not support paths longer than 255 characters
- Added accessibility message for Worldox integration for sites with WD versions later than 20180412 that WD does not support paths longer than 380 characters
6.6.82 -
- Added conditional logic to Worldox integration to allow spaces after filename and before the file extension (WD code running after 20170601 allows spaces)
6.6.81 -
- Improved detection of pages that should be ocr'ed even though they have excessive text in margins
- Improved detection of pages that should not be ocr'ed even if they have full image on the page and rendered text beneath the image
- Added additional columns to Document Details screen, page analysis results
6.6.76-
- Bug fix - setting for limiting maximum number of cores used during OCR was not being honored
- Changed Processor configuration UI for clarity around maximum allowed parallel processing settings
6.6.73 -
- Bug fix - SOCR doesn't shut down at NetDocuments sites that were actively searching for documents
6.6.71 -
- Bug fix - NetDocuments configuration screen not showing error/warning details for Analyzer-only licenses
- Bug fix - NetDocuments integration was having preserveModifiedInfo set to false after setup with Analyzer-only licenses
6.6.69 -
- Added separate NetDocuments connection buttons for US, EU and AU vaults
6.6.68
- Bug fix - Welcome Wizard didn't work properly if there were features in the license that required DMS configuration to work properly (# of ND or SP users, for example)
- Removed the 'Manage' button from Issues list in welcome wizard
- Improved error message for invalid tenant URLs
6.6.64 -
- Bug fix - blank pages without any content stream were marked as corrupted instead of blank
6.6.59 -
- Bug fix - the user interface prevented setting the SharePoint legacy search frequency to 0. This is now allowed.
6.6.58 -
- If SharePoint legacy search frequency is set to 0, the legacy search will be skipped
6.6.57 -
- Bug fix - Fixed a bug with Rollback so it would work with the first button click (from the document detail page)
- Added the bulk operation "Rollback" to the Processed documents page. When clicked, all documents in the current search will be rolled back to their non-OCRed version
- Added the bulk operation "Reanalyze" to the Reprocessing documents page. When clicked, all documents in the current search will be moved to the Analyzing bucket
6.6.55 -
- Reworked the processing metrics on the Analyzer, Processor and Summary pages to show more useful data in a more user friendly fashion
- Removed all support for Bonus Page tracking
6.6.54 -
- Moved to jWDAPI 1.0.22 to have the WorldoxSession fast fail if the user is invalid
- Backed out the previous Worldox invalid user fast fail code
6.6.53 -
- Fixed bug where wrong page tracker was being used by the Analyzer
6.6.52 -
- Modified processor core algorithm to not consider physical cores on the machine. Solely determined by the license now
6.6.51 -
- Fixed bug in the ProcessorManager config that was causing maxThreads to be persisted and thus override calculated maxThreads
- Updated the version check url to be https to work with the new version update https redirection
6.6.50 -
- Tracked WorldoxConnection failure due to invalid user, and quick fail on repeated calls to the connection until a valid user is provided.
6.6.48 -
- Moved to jlicensing 1.0.10 to support new license expiration warning logic
6.6.46 -
- Created a ProcessorManager that will not create Processor tasks that do the work. These tasks will be added to
an executor service so we can run multiples in parallel. A ProcessorManager will be created for each document
processing type (analysis, ocr, rollback)
- Split the processor mgmt (stop, start) config into a new section, and provided migration for it
- Renamed the processors to AnalyzerProcessor, OCRProcessor and RollbackProcessor. Supporting classes followed suit
- Added a WorkingFolderProvider to track and manage working folders for the managers
- Created ProcessorThreadPoolExecutor for use by the ProcessorManager. It has the ability to block task addition until a thread
is available
- Refactored the OCRProvider to remove the generic parts, since we only support a since ocr engine
- Refactored the page count handling
- Implemented dripMode in the ProcessorManager
- Deleted OldPageCountFeature
- Made processor factory config classes immutable
- Added more support for Processor status
- Added a maxDripsBeforeHalting setting to ProcessorManager, to allow dripMode to halt after X docs are processed, rather than just 1
- Fixed bug in the Analyzer config screen that wasn't persisting the isAllowMsgAttachments setting
- Added custom message support to the ErrorTracker so we could get better error messages in the UI
- Updated Processor and Analyzer web pages to show a list of documents being processed
- Updated statistic verbage per changes
- Modified page statistics to divide results by the number of running threads
6.6.45e -
- Bug fix - SOCR was checking to make sure 8.3 filename information was available for all versions of Worldox integration. We now only check if the version is prior to the WDU10 release (which fixed 8.3 realted issues in WDAPI)
6.6.45d -
- Modified calls to NDAPI for creating new versions to ensure we don't modified lastModified info
6.6.45c -
- Performance improvement when retrieving number of active users in NetDocuments integration
6.6.45b -
- Add support for unlimited user count licensing for NetDocuments integration
6.6.44 -
- Bug fix - some malformed PDFs (huge page catalogs, large number of pages) could result in out of memory exceptions
20171006
6.6.24 -
- Updated to NDAPI 0.0.53 (upgraded document getSize() methods)
- Modified NetDocuments processing to use the new NDAPI getSizeBytes() method for consistent file size checking
- Moved configuration loading of the preprocessor and processor to the end of the config load, to prevent feature initialization issues
6.6.22
- Made SystemStatusProvider get the NetDocs status from the ND source, not connection manager
- Modified NetdocsDocumentSource to store an ignore warnings flag for the new warning
- Modified NetdocsDocumentSource to be Status aware and provide the overall status for NetDocuments
- Modified the Netdocs web page handler to use the new NetdocsDocumentSource getStatus() call rather than the ND conn manager call
- Fixed a bug in NetDocuments processing that was using different file sizes during file searching, resulting in unnecessary downloading of files for reprocessing
6.6.19 -
- Regression bug - Log output wasn't being written to the maestro.log or error.log files. Introduced in 6.6.12
6.6.18 -
- Added belts and suspenders to OpenText implementation
- Added belts and suspenders to LSSe64 database methods
- Added dripMode to Processor, allowing for the processor to be stopped after a single document is processed
- When OpenText feature is enabled, set the Processor and PreProcessor to not autostart, and to be in dripMode
- Modified OpenText search for files SQL to use enhanced SQL and never return docs that don't have allowed extensions
- Made the OpenText modified flag value customizable, to allow for easier beta testing
- Prevented actual file changes to OpenText files, until beta testing shows us the correct way to make changes
- Modified OpenText lastModified time for files to use the database value rather than the file value
6.6.16 -
- Regression Bug fix - introduced in 6.6.10 - installer creating 'work' folder in the installer exe folder
6.6.13 -
- Fixed null pointer exception in PDFFileAnalyzer
- Fixed null pointer exception in PurgeUnavailableTask
6.6.12 -
- Updated to NetDocumentsAPI v0.0.51 to support document content change without changing modification info for file extension changed (tiff->pdf)
6.6.11 -
- Updated to NetDocumentsAPI v0.0.50 to support document content change without changing modification info
6.6.7 -
- Added a description to the SymphonyOCR Windows service
6.6.5 -
- White labeling is currently only available for SOCRCLOUD licenses. If enabled, the "D" Worldox feature must be disabled to prevent conflicts
- White labeling reports the number of used Worldox seats in the heartbeat, but the license does not rely on seats for validation
- Added a WhiteLabelingFeature, which extends WorldoxFeature, ignores seat checking for validation, but checks the user domain for a valid cloud domain.
- Added a WHITELABEL user count strategy to WorldoxFeature
6.6.3 - DevRelease
- The license code for OpenText is "O", and licensing is based on active people in the system (seats)
20170707
6.6.2 -
- Bug fix - some encrypted PDF files weren't being flagged as encrypted during analysis (they were failing during OCR and landing in the Needs Attention list)
6.5.71 -
- Better handling for quasi-invalid PDF files (PDFs that have null AcroForms objects now are handled cleanly) - ClassCastException after processing
6.5.70
- Added a new global "alwaysAnalyzeAndProcessNoImageNoTextPages" setting to the config file, under a new "heuristicComputerProvider" setting.
- The new setting will default to false (existing behavior) but can be manually set in the config file.
- The new setting will provide a way to allow no image no text pages to be forceably processed across the board, instead of manually per document.
20170614
6.5.69
No Changes
6.5.68 -
- Fixed UI bugs - Folder configuration 'Add' button wasn't rendering properly. Some buttons didn't display 'Hand' cursor to indicate they are clickable.
6.5.66 -
- Modified the buttons on the NetDocuments approval page to use the same styles as the other pages
6.5.65 -
- Removed the reprocessing of TOO_OLD docs on startup
6.5.64 -
- Modified Analyzer installer to elevate level to admin
- Created a new "createInstaller.cmd" script that creates the Analyzer installer and digitally signs it
- "createInstaller.cmd" relies on two new settings in deployment.properties
deployment.resources.signature.location=[signature location]
deployment.resources.signature.secret=[signature password]
6.5.63 -
- Modified the NetdocsFinder to set documents in the Too Old folder to REPROCESS, when the Process legacy documents setting
is enabled.
- Added TOO_OLD as a new value in DocumentAccessibility enum
- Reprocess TOO_OLD docs on startup
6.5.62 -
- Added styling for disabled buttons
- Fixed issues with paging buttons on Document List page
6.5.61 -
- Removed legacy and deprecated css links, which were overriding our settings
- Added new reset.css to undo button settings set by the browser
- Modified the button style to force a hand pointer when the mouse is over the button
- Removed [] from the NetDocs Log In button
- Returned the Learn More links on the Document List pages to be hyperlinks rather than buttons
6.5.60 -
- Fixed more buttons who missed the style upgrade.
- Modified button styles again to make shadows less invasive
6.5.59 -
- Modified style for Scheduler config Delete buttons to remove the border
20170310
6.5.56 -
- Modified the Analyzer installer script to create a file with "c" version, and to point to the SOCR Pre-release installer
6.5.54 -
- Added ability to override the host name used when displaying the user interface. Settings.xml, webServerConfiguration, hostname (this is not added to the config file by default - you'll have to set it explicitly)
6.5.53 -
- The email of the logged in NetDocuments user now appears after the login name on the NetDocuments configuration screen
6.5.51 -
- Modified Analyzer configuration page to allow for the setting of MSG (email) attachment processing. This setting will sync with the
same setting on the Processor configuration page, so that both are either on or off.
6.5.50 - dev release
- Bug fix - Symphony OCR hangs when processing some large, complex PDF files. java.lang.OutOfMemoryError: Java heap space message appears in logs
6.5.49 - dev release
- Added the Windows user name and fully qualified hostname to heartbeats
6.5.48 - dev release
- Fixed potential issue with supporting MacRoman character sets (used by PDFs generated on Mac computers) under newer versions of Java
6.5.43
- Updated verbage of SharePoint configuration page, and added ability to set frequency for a legacy file finder
6.5.40
- Added SharePoint URL to the SharePoint configuration screen
6.5.37
- Updated SOCR to work with NetDocumentsAPI v0.0.41
- An audit entry will be added to NetDocuments documents when OCR is completed, rolled back, etc.
- The audit entry for Worldox documents when OCR is completed was modified to be more descriptive. Audit entries were added when
documents are also rolled back, etc.
6.5.36 -
- Bug fix - [View Timeline] links in Simple View allowed users access to the non-simple UI
6.5.33 -
- Regression bug fix - introduced 6.5.31 - Installation on machines without Java 8 result in 'UnsupportedClassVersionError' popup dialog on launch
6.5.32 -
- Bug fix - OCR and Analyzer working directories had garbage files left behind if SOCR was shut down in the middle of OCR or analysis
- Adjusted default settings for determining maximum page size that will be processed. This is now specified in total pixels (instead of maxHeightPixels and maxWidthPixels). The new setting is maxPixels, the default is 36000000, which is a little larger than a C sized sheet of paper. For backwards compatibility, if the existing configuration has maxHeightPixels and maxWidthPixels set to 10000 or 12000, the default is used, otherwise maxPixels is set to maxHeightPixels x maxWidthPixels
6.5.30
- Regression bug fix, introduced in 6.5.22 - OCR engine fails to run on Windows XP workstations - error message refers to ADVAPI32.dll procedure entry point RegSetKeyValueA
* Move to SymphonyOCRProcess.exe 6.5.0.30
6.5.28
- Bug fix - out of memory errors when analyzing PDFs that contain an excessive number of embedded fonts
* Move to 5.5.11-SNAPSHOT.jar (disables font caching in PdfContentStreamProcessor if the file has more than 10 fonts)
6.5.25 -
- Bug fix - ShareFile integration was only searching 'Shared Folders' (not 'My Files & Folders' or 'Favorite Folders')
6.5.22
- Added support MICR processing (magnetic ink characters typically found on bank checks). When enabled, only a single processor core will be used for OCR, so this should only be enabled for sites that truly need MICR processing.
- If MICR processing is enabled, that will be indicated in the Advanced Settings section of the Processor Config screen
6.5.21 -
- Improved task status message when manually initiating Compact Database command in Debug screen (it used to say 'maintenance completed' until the maintenance got underway)
20161014
6.5.20 -
- Refinement to 6.5.19 bug fix - if a mutation failed during processing, the document was being put on Needs Attention. It now gets put onto Reprocess.
6.5.19 -
- Bug fix - If processing was actively working on a document while nightly database maintenance happened, database corruption (or forced shutdown of SOCR) would occur
6.5.17 -
- Bug fix - files marked as read-only weren't being processed, even though 'Allow processing of read-only files' was enabled
6.5.16 -
- When rolling back, we now reset the previous analysis results
- Additional handling for invalid PDFs that have page rotations other then 0, 90, 180 and 270. SOCR now handles these pages just like Acrobat (which means that it ignores the angle specification entirely)
20160909
6.5.13 -
- No change
6.5.12 -
- If SOCR encounters an out of memory error, it now kills the application (want to prevent database corruption in the event of a memory problem)
6.5.11 -
- Corrupted file detection enhancement - some rare corrupted PDFs caused documents to appear on the Needs Attention list instead of Corrupted
- Bug fix - post-OCR PDF failures could cause the working directory to become locked - after that, all future processing would wind up in the Reprocessing list
6.5.10 -
- Adjustment to NDApi calls to specify NDVaultLocation
- Bug fix 'Engine not initialized — Cannot run program' and 'CreateProcess error=19' and '%1 is not a valid Win32 application' errors on Windows XP machines
6.5.9 -
- Folder document source now adjusts the modified time of the file by 1 minute (this is to make sure that indexing services will see that the file has changed)
6.5.8 -
- Regression bug fix - 'null' error when processing partial documents (only some pages in the PDF needed to be OCRed)
6.5.7 -
- Adjusted Processor error handling behavior - we now pause processing (or analysis) if we encounter more than 10 errors in a 15 minute window (prior behavior was 5 errors in a 60 minute window)
6.5.6 -
- Regression bug fix - "Execution of parallel task failed: Not enough memory! Failed - 0x80004005" error when processing documents that have low image quality.
6.5.5 -
- Bug fix - If Processor was restarted at exactly the wrong time, Processor would display error message "Unable to add pages to page tracker - null. Restart Symphony to clear this error."
6.5.4 -
- Bug fix - PDFs that were missing MediaBox field on a page definition resulted in the file being marked as corrupted. Technically, these are not valid PDFs, but we can still analyze them by assuming a default page size of 8.5x11"
6.5.3 -
- Bug fix - foreign characters didn't display properly in the SOCR web interface
- Bug fix - some files with foreign characters in filename would fail to OCR and appear in the Needs Attention list with error messages about the file not existing (and the file name shows many question mark symbols)
- Bug fix - files that a user had opened in Folder and Worldox document sources were placed on Inaccessible list, and didn't reprocess until the next day. Now these files will be placed on the Reprocessing list, and will be processed when they are found again (assuming the user has closed the document by then)
6.5.2 -
- Improved error message if Outlook isn't unavailable to indicate that 32 bit Outlook is required
- Added note about 32 bit Outlook to the Processor config screen
6.5.1 -
- No change
6.4.125 -
- Added support for non-western characters in OCR results
20160707
6.4.124 -
- Bug fix - Some documents winds up in the Reprocessing list repeatedly with "Unable to read for analysis - null" in the history. Null Pointer Exception when scanning some PDFs for digital signatures.
6.4.123 -
- Bug fix - MSG attachments for MSG files in deep folder paths resulted in the document being continuously placed on the Reprocess list
6.4.122-
- Re-enable flush of content pages every 100 pages (this was originally enabled in 6.4.23, but inadvertently disabled in 6.4.30
6.4.121 -
- Certain "page too big" errors were resulting in document being placed in No Image/No Text list instead of the Too Big list
6.4.115 -
- Added support for additional languages (language dictionary files are not being deployed yet, so that needs to be done before we really use this - currently only English, Spanish, Brazil dictionaries are part of the Engine installation - more can be added as needed)
Specifying languages is currently done in settings.xml in the <ocrHandlerProvider languages=""> element. Values should be comma separated, with no spaces. Default is "English".
6.4.110 -
- Add support for custom NetDocuments OAuth connection parameters (this will allow the firm to request that NetDocuments preserve modified by and modified on values during OCR)
6.4.104 -
- Bug fix - ND documents that were on legal hold, archived, signed or approved were repeatedly processed. They will now be placed in the appropriate Not Processed queue.
6.4.103
- Bug fix - detection of non-8.3 paths in Worldox document repositories wasn't working properly on certain NTFS volumes
6.4.102
- Improve error handling and reporting in PracticeMaster integration
6.4.100 -
- Bug fix - Memory leak in NetDocuments integration (eventually resulting in OutOfMemoryException errors and a heap dump in the logs folder)
- This bug would have also impacted ShareFile integration, so that has been fixed as well
20160201
6.4.99 -
- Added ability to process digitially signed documents (note that this *will* invalidate the digital signature) - this is enabled by adding allowDigitallySigned="true" to the documentPreProcessor element in settings.xml
6.4.97 -
- If manual license update check fails, we now display the error message (before, it was showing an exception trace)
6.4.96 -
- Bug fix - grace period wasn't working properly
- Added new scheduler entry for auto-checking for license udpates from Trumpet license server. These updates are auto-scheduled for a random time on a weekday. The update check won't actually happen unless at least 3 days have passed since the last update check, or if the license status is in a warning or error state (in which case it'll check once per day at the scheduled time)
- Bug fix - ND and SF finders weren't starting up on brand new sites (had to stop and restart SOCR after entering the license number)
- Updated help hyperlinks for Not Processed lists
- Bug fix - heartbeats were including pages left, even for unlimited page licenses
- Notification configuration now says "When there are warnings or errors" instead of "When there are warnings"
- Added Check for Updated License button on Licensing page
20151214
6.4.94 -
- Bug fix - Analyzer and Processor failed to start - error message about 'The application has failed to start because its side-by-side configuration is incorrect'
6.4.92 -
- Better handling if the processor or preprocessor throws an uncaught error - we now shut the processor down and report an error
6.4.91 -
- Bug fix - if backupFileRoot was pointing at an old SOCR installation directory (e.g. SOCR folders in C:\Program Files\ instead of C:\Program Files (x86)), backups and processing would fail. We now detect this situation and change the setting to the relative .\work\backupfiles value.
6.4.90 - dev release
- Adding debug lines to troubleshoot failed backup copy
6.4.89 -
- Bug fix - NetDocuments support for EU data centers wasn't allowing login to the EU site
- Added support for AU data center (still have to manually adjust in the settings.xml file)
6.4.88 -
- Bug fix - NetDocuments support for EU data centers wasn't directing to EU login page
6.4.87 -
- Database compaction algorithm will now purge any document records that were damaged by small database corruptions
6.4.86 -
- Bug fix - on machines running on unreliable networks, network hiccups in the middle of analysis could case SOCR to crash to ground (error message EXCEPTION_IN_PAGE_ERROR (0xc0000006) )
- Bug fix - backup results were being writting to the <install directory>\work\backupfiles folder instead of the app working folder override
- Made GeneratePerformanceSummary task do nothing (it's really not needed anymore), removed GENERATE_PERFORMANCESUMMARY_TOPROCESS and GENERATE_PERFORMANCESUMMARY_PROCESSED from any existing scheduler configuration
- Change default schedule time for "Re-analyze Re-Process lists" to be 11:30pm every day, instead of 12:00 every day
- Change default schedule time for "Purge backups" task to be 4:00am every day, instead of 12:00 every day
6.4.85 -
- Added Processing Time (ms) to CSV export of document list
6.4.84 -
- Bug fix (Case 35047) - When starting or restarting SOCR with LSSe64, the LSSe64 finder would often fail to start, the LSSe64 finder now starts reliably.
- Bug fix (Case 35024) - SOCR Was processing documents with extensions that weren't PDF, TIF or MSG and adding these to the Corrupted list, such documents are now ignored.
6.4.83 -
- Bug fix - PDFs with small number of pages but really big image content (i.e. hi resolution photographs embedded in the PDF) could cause OCR to fail with an out of memory error
- Bug fix - When entering text into the LSSe64 DB password field, it was not masked with ****. This is fixed by using a HTML "password" input type rather than "text".
- Bug fix - SOCR for LSSe64 was attempting to process non-image documents (e.g. .DOCX files) and this led to them appearing in the corrupted documents list. SOCR for LSSe64 has been modified to now only process documents with a PDF, TIFF or MSG extension.
6.4.82 -
- Adding support for NetDocuments EU data center (just added to settings.xml at this point - not in UI yet)
6.4.81 -
- Added ability to filter found documents be explicit dates (this is done by editing the settings.xml file, finderHandler section, cutoffTimeHigh and cutoffTimeLow values)
6.4.80 -
- We now send heartbeat when the user changes their license
6.4.79 -
- Analyzer now ignores stroke and fill color operators in PDF (rg and RG) (we were seeing some PDF files where these operators weren't being properly used, but that failure isn't going to prevent us from determining whether the file is safe to process, so we'll just ignore those types of errors)
6.4.78 -
- Document processor is now stopped and started during backup purges
6.4.77 -
- Doc source type for practice master wrong due to inheriting folder finder task's characteristics. Added a simple override to address this.
- The check for a filename being within a folder pathnames was errant, the logic needed to be reversed.
6.4.76 -
- Better error handling for NetDocuments API timeouts
6.4.75 -
- Better logging on 0xfffffffd msgedit errors
6.4.74 -
- Replaced 'Advanced' side menu with 'Debug' link in upper right corner of Welcome screen only
- Adjust build script so it reads from ${user.home}/.m2/deployment.properties instead of properties being defined explicitly in settings.xml
6.4.73 -
- Better error message for Outlook installation issues (msgedit error 0xfffffffd)
6.4.72 -
- Cosmetic fixes throughout (PracticeMaster instead of Practice Master, Advanced heading in PM config screen appeared twice)
6.4.71 -
- Added whether we are running as a service or not ('Service: true') to heartbeat status
6.4.70 -
-Refactored Practice Master code to reduce complexity.
-Improved validation and error reporting when user makes erroneous configuration changes to Practice Master.
-Simplified the generation and processing of the HTML delivered to user's browser for Practice Master.
6.4.69 -
- Improved error handling if a database commit fails (database is marked as corrupted and is stopped so further damage can't occur)
6.4.68 -
- Added Advanced Processor configuration setting to control the maximum number of cores that will be used during OCR. If left empty, we will use all available cores (up to 4). Right now, the setting must be empty or a number between 1 and 4.
6.4.67 -
- Switch resource server to http://resources.trumpetinc.com
6.4.66 -
- Bug fix - OutOfMemory errors when processing really large NetDocuments documents
- Move to NetDocsAPI-0.0.14.jar
6.4.65 -
- Switch heartbeat server to http://heartbeat.trumpetinc.com/heartbeat/sendheartbeat.jsp
6.4.64 -
- Added Worldox validation message for version WDAPI.20150624.1852 indicating that version of WD has a bug when processing legacy documents
6.4.63 -
- Improvement to orientation correction code to avoid analyzer lockups when hitting huge pages
6.4.62 -
- Bug - divide by zero exception in Processor configuration screen when all pages of a trial license are consumed
6.4.61 -
- Moved to sswebservices-0.0.3.jar (logging enhancement)
6.4.60 -
- Bug fix - deleting a document record from the Document Detail view resulted in ClassCastException instead of returning the user to the Welcome screen
6.4.59 -
- Improvements to Active User count strategy (we will now fail if the WD license isn't valid or if the version of WD doesn't support active user count determination
20150923
6.4.58 -
- Bug fix: Some sites running Server 2008 R2 in rare configurations (SMB1 with loopback mapped drives) would kernel fault with a Blue Screen of Death (operating system bug triggered by behavior in one of SOCR's analysis modules). This is now fixed.
6.4.49 -
- Added specific error message if a Worldox document couldn't be processed because of missing 8.3 filename information
6.4.48 -
- Added a message to the end of the maintenance screen indicating that maintenance is complete
6.4.44 -
- New feature: Users can now roll back to the un-ocred version of the document as long as the document hasn't been modified since it was OCRed (this is available even if the short term retained versions have been purged)
- Enhancement when processing huge file (thousands of pages) - disk space usage is now limited to around 12 GB during processing (prior, it could expand indefinitely - approximately 12MB per page)
6.4.43 -
- Bumped the maxWidthPixels and maxHeightPixels values from 10,000 to 12,000 - there were a lot of engineering drawings that were just barely above the 10K limit (i.e. 10804x7212)
6.4.42 -
- Minor bug fix - LSSe64 integration was trying to connect to the SQL database, even if the license wasn't activated for LSSe64 (move to lazy loading of connection)
6.4.41 -
- Changed logging when profile group isn't available via WDAPI to be debug logging (logs were getting flooded when a PG was removed from processing)
6.4.40 -
- Added debug lines troubleshooting FOLDER document source issue
- REGRESSION - Bug fix - all documents at Folder Tree sites were being sent to Unavailable list.
6.4.38 -
- Bug fix - Null Pointer Exception in some corner cases when saving changes to NetDocuments integration configuration
6.4.37 -
- Added support for Practice Master DMS which uses a simple file system folder based document storage strategy.
- Added support for validating SOCR max licensed users against Practice Master's own configuration file's max licensed users.
- Small number of internal code refactorings with no functional visibility.
6.4.36 -
- Bug fix - clicking on Open links for Worldox documents resulted in an empty tab opening in the web browser
6.4.35 -
- Ensure we validate the SOCR licensed user count against the LSSE64 licensed user count.
6.4.34 -
- Fix bug in which null was returned for a document's priority level.
6.4.33 -
- Increase lengths of input fields for DB credentials and restart the LSSe64 finder when we change credentials.
6.4.32 -
- Disable a test and adjust the logic of another test in a suspect area, add a native paths and finally also remove an old import and up the overall version number.
6.4.31 -
- Added support for flagging certain versions of Worldox as having problems (see WorldoxConnection#getStatusResult() )
20150721
6.4.30 -
- Changed installer so Run as a Service message indicates that it won't work with Worldox sites
6.4.29 -
- NetDocuments compatibility fix - sites that didn't have legacy processing enabled weren't finding documents to process since the most recent ND update
6.4.28 -
- Bug fix - introduced in 6.4.11 - Files with mismatched extensions (e.g. a PDF with a TIF extension) wound up in an infinite 'analyzing' loop
- add ability to specify location of dev mode abbyy home
- adjusted how development mode path determination is made (com.trumpetinc.development.abbyybinfolder system property)
6.4.26 -
- Bug fix - the backupFileRoot was being stored as an absolute path instead of a relative path. End result is that copying configuration from a 32 bit machine to a 64 bit machine resulted in the default backup location incorrectly pointing at C:\Program Files\ instead of C:\Program Files (x86)\
6.4.25 -
- Added a setting to settings.xml to set a maximum page count limit on what documents will be processed. Documents with more than the limit of pages will be put onto the Too Big list. Setting is not configurable through the UI - you must edit settings.xml directly and add the following to the existing <documentProcessor ..... /> element: maximumPageCount="300" (or whatever page count limit you wish to set)
6.4.24 -
- Bug fix - if MSG handling was enabled on a machine without Outlook installed, it was not possible to disable MSG processing (though it looked like it was disabled in the Processor config screen). End result was a permanent warning message about the MSG sub-system not working
6.4.23 -
- Accuracy improvements in OCR engine
- Bug fix - out of memory errors when processing really big PDF files (thousands of pages)
6.4.22 -
- Improve OCR accuracy in some corner case scenarios
6.4.21 -
- Increased accuracy of OCR engine by making it slightly slower
6.4.20 -
- Added better error message if NetDocuments login doesn't have permissions to query user count
- Add note to NetDocuments Login button indicating that the user must be an NetDocuments Admin
6.4.19 -
- Added caching to progress graph display - this writes the graph data to disk every 5 minutes, and reads that data on launch. This should allow us to display the graph right away, without there being a refresh period (which could be quite long at sites with lots of documents)
6.4.17 - Dev release
- Experimental functionality for emitting OCRed PDF page content incrementally - should fix problems with running out of memory when processing really big files
- Right now, this flushes every single page - before moving to pre-release we probably should change that so it flushes every 50 or 100 pages
6.4.14 -
- Bug fix - issues when reading MSG files could result in user interface displaying RuntimeException stack trace
6.4.13 -
- Added ability to override maximum heap that Symphony OCR will use. This is done in a launch.ini file that must be stored in the SOCR application directory. The setting is controlled in the [JVM] MaxHeap=512m setting (this is the default - 512 MB heap). It can be increased - so for example, MaxHeap=1024m would increase it to 1GB.
6.4.12 -
- Fix issue where integration with old Worldox GX2 sites failed with 0xfffffff error
6.4.11 -
- Issue fix - web based DMSes can lose connectivity. When this happens, documents pending analysis and processing wind up being moved to the Unavailable list. When they become available again, SOCR was re-analyzing the files. This caused a lot of unnecessary downloading.
- If a document hasn't been changed since last analysis, it will not be re-analyzed unless the user explicitly clicked Re-Analyze in the Document Detail screen or the Document List screen
6.4.9 -
- Suppress error log entry that is logged if graph generation is interrupted (not adding any value - this is normal behavior)
6.4.8 -
- Make SOCR so it only complains about spaces in a profile group's base path if the profile is defined on a mapped network drive (i.e. doesn't start with \\)
6.4.7 -
- Added Ingore capability to all documents in the Backlog list (in case users want to ignore documents that haven't been analyzed and/or processed yet)
6.4.6 -
- Removed warning message added in 6.4.5 - it looks like WDAPI won't actually block if the PG is set to read-only
6.4.5 -
- Added warning message if a selected Worldox profile group is marked read only
6.4.4 -
- Add history when file has too many pages to process with the current license
6.4.3 -
- Bug fix - huge PDF files (>10K pages) processed during free trials would cause SOCR Processor to stop. These documents will now be moved to REPROCESS
6.4.2 -
- Make Statistics panel on main page show number of OCR backlog documents as well as pages
Here's an overview of the major changes:
1/20/2015
6.3.65 -
- Bug fix - in rare situations, SOCR could fail to launch with 'ConcurrentModificationException' stack trace
6.3.63 -
- Bug fix - install as a service always resulted in 0x421 error
- Adjusted label on username field to indicate domain\user
- Bug fix - installer wasn't always remembering the correct installation path
6.3.62 -
- Bug fix - scheduler task wasn't working properly if tasks were scheduled for later the same day
- Bug fix - changes to scheduler configuration weren't taking effect until SOCR was restarted
- Display issue - label on Stop OCR Processor task was missing 'OCR'
- Small tweaks to layout of scheduler interface (moved the activity, time and days fields around so they are more intuitive, changed the order that activities appear in the drop down list so they are in order most likely to be used)
6.3.60 -
- Installer - display appropriate header text in screen that prompts for the user to run as
6.3.59 -
- NetDocuments Create Versions setting was turned off by default - it is now turned on by default
- Fixed documentation link on option to enable legacy processing
6.3.58 -
- Workaround for NetDocuments Invalid Hashable error when trying to get the display path of a document (ND changed their API)
6.3.57 -
- Make display of Ignore, Reprocess, Delete and Adjust Priority so they are consistent between the document lists and document details views
6.3.56 -
- Fix 'communication timeout' errors when sending nightly notificatons
6.3.53 -
- Installer bug fix - cloud installs were launching SOCRTray after the installer finished (now it correctly launches SOCR.exe)
6.3.52 -
- Bug fix - ShareFile integration would say that it wasn't connected when it clearly was
6.3.51 -
- Removed finding of TIF files (changing file extension of an existing document causes duplicate files to be created in ShareFile) - we can bring this back in when we make version creation optional
6.3.50 -
- First iteration of ShareFile integration
- Added ShareFileAPI 0.0.2-SNAPSHOT
6.3.49 -
- Added a log (logs/autorotate.log) to capture the number of pages that were auto-rotated during processing - this is disabled by default - to enable, edit log4j.properties and change the "log4j.logger.autorotatelog=ERROR, AUTOROTATEFILE" line to "log4j.logger.autorotatelog=INFO, AUTOROTATEFILE"
6.3.47 -
- NetDocuments configuration screen had two Basic Settings sections - these have been merged
6.3.46 -
- NetDocuments configuration screen now has an option to "Look for legacy documents". Disabled by default. When enabled, the ND integration will find all documents. Otherwise only documents modified in the past 7 days will be included in the Find phase.
6.3.45 -
- Notification emails now include the name of the SOCR machine in the subject
6.3.44 -
- SOCR now tracks the original file modified date of each document it finds. This is the date used in reporting backlog metrics. End result is that if the DMS forces the modified date to change during processing, the backlog progress graphs will still display the number of pages added over time properly
6.3.43 -
- Changed NetDocuments 'Create versions' option to default to 'true'
6.3.42 -
- SOCR uninstaller will now shut down existing running instances of SOCR (both running as user AND running as service)
- SOCR will no longer give the "run as user/run as service" dialog for cloud installs (default will be "run as user")
- Startup shortcut wasn't being removed during uninstall
6.3.41 -
- Adding support for running SOCR as a windows service
- When running as a windows service, it is not possible to shut down SOCR from inside the user interface - shutdown must be performed using Windows Services
- Added SymphonyOCRTray.exe - this is an applet that runs and puts the SOCR icon in the system tray when SOCR is running as a service
6.3.40 -
- Bug fix - HTML in Needs Attention document lists could be misrendered if the reason contained <<snip>>
6.3.38 -
- Special handling for ND files that were emailed directly into ND (ND forces us to create a version of these types of files)
6.3.37 -
- Installer wasn't adjusting the modified date on the sample images
- Improvements in error/warnings condition reporting during typical ND initial configuration use-case
6.3.36 -
- Ignore button missing in Details screen for Too Big documents
- Add knowledgebook hyperlink for ND config screen
- Added 'Processor > Basic Settings > Automatically rotate pages to proper orientation' option. Enabled by default. If turned off, SOCR will not adjust the page orientation in the output PDF.
6.3.35 -
- Display the repository name along with the cabinet name in the NetDocuments configuration screen
- Tweak system status display so we can click into the license screen if there are licensing issues
6.3.34 -
- Bug fix - if NetDocs document meta data wasn't available for computing workspace path, an IllegalStateException was being thrown
6.3.33 -
- Bug fix - nullpointerexception if unable to get path from workspace information for ND document
- Enhancement - Changed Lookup By Path to 'Lookup Document'. Users can now type in the doc ID of the document (as it appears in the source document management system). SOCR will query the DMS to locate the actual path of the document and display the details.
6.3.32 -
- Buf fix - ND integration wasn't searching for MSG files
6.3.31 -
- Cloud installer now auto-detects that we are on a WD Cloud terminal server and sets the default install location to "<path to CID Folder>\blah"
6.3.30 -
- Bug fix - files in FOLDER finder were still being processed, even if the folder was marked as inactive
- Added Create Versions option to NetDocuments integration
- MSG files from NetDocuments will now always create a version if the current file only has a single version (this is a special requirement from ND)
- If a file in ND is detected with an incorrect extension, a new version will be created with the correct extension
6.3.29 -
- Installer is now 'cloud aware' (safe to install into the WD Cloud environment)
6.3.28 -
- Summary: Don't display 'Current OCR throughput' data unless OCR has actually happened
- Bug fix - SSCLOUD and SOCRCLOUD licenses weren't working
6.3.27 -
- Added Powered By Abbyy and Trumpet logos to maintenance screen
- Removed msg files from NetDocs finder - NetDocs doesn't do full text indexing of MSG files, so there's no point in processing them
6.3.26 -
- Added spacing between table cells in document list display (prevent path and page number from being too close together)
- Improved error message if ND workspace configuration isn't set up properly
- We now track when the ND refresh token is valid through (1 year) and display a warning to the user 15 days prior to them needing to manually re-authenticate the SOCR -> ND connection
- We now track when the ND authentication token is valid through (24 hours, or 45 minutes of inactivity, whichever comes first) and auto-reset the connection instead of waiting for it to fail on an actual call
- Changed icon to indicate OCR (differentiate between S-Pro Workstation sys tray icon)
- Changed the UI for setting frequency for backlog searches to it works in days instead of hours
- Changed default search frequency for backlog searches to be 7 days
6.3.25 -
- NetDocuments configuration screen now hides the settings unless the connection to ND is established
- If the connection to ND is not established a button appears for the user to explicitly connect Symphony to ND
6.3.24 -
- Bug fix - NetDocuments Finder wasn't auto-starting after connecting to ND
- Bug fix - heartbeats were being initialized and sent before everything was configured - this caused the 000000 Worldox user to be used for the first several minutes of the application running, even if a different user was configured
- Worldox 'open' links will now use wdox:// hyperlinks instead of generating wdl files if WD is newer than 8/15/2014
6.3.23 -
- Added Open link for NetDocuments document records
6.3.22 -
- Added support for NetDocuments DMS
- Added Progress Details screen (detail hyperlink next to system summary progress bar on Welcome page)
6.3.19 -
- Added display path in addition to the canonical path for each document. Filtering from the document lists will be performed against the display path. If the display path is different from the canonical path, the canonical path will be displayed as an additional attribute in the Detail screen of the document record
- Made email attachments so their display path is the name of the attachment instead of the awkward 0000000001 number
6.3.18 -
- Changed sample scan and msg files to be more fun (drink recipes)
- Bug fix - problems with MSG handling support weren't being detected immediately at launch (they only appeared after several minutes)
6.3.17 -
- Notification warning message will now display the full message in the Welcome screen instead of just 'Notifications have problems'
6.3.16 -
- Tweaked wording on No Image/No Text document list 'What's this' description
6.3.15 -
- Added ability for user to force processing of No Image/No Text documents. This can be initiated using the 'Enable Processing' button on the Document Detail screen, or using the 'Enable Processing' bulk action button on the Document List screen. These buttons only appear for documents that are in the No Image/No Text list.
6.3.14 -
- Added e-mail notifications (see new Notifications screen)
- Added overall progress bar to Summary (welcome) screen
- Removed the License Info section of the Summary screen for unlimited page count sites
- Added 'pages processed in past year' to the Statistics area of the Summary page
- Bug fix - SOCR was loading it's configuration twice during launch
- On startup, SOCR will now kill any lingering WBAPI.EXE instances that were started by other instances of SOCR
- Add Simple View links to Summary (welcome) screen - this will display a view of the summary page that doesn't contain any links to other areas of SOCR
- Changed the 'Basic View' link on the Search Summary screens to be 'Simple View', and moved it to the upper right corner of the Search Summary page
6.3.13 -
- Added a 10 day grace period if the Worldox user count goes above the licensed Symphony user count. During this grace period, the license issue will display as an Error, but processing will continue. After the 10 days, the issue displays as an Error and processing will stop.
6.3.11 -
- Improved note behavior from 6.3.10 to encourage users to actaully pay attention
6.3.10 -
- Added a note reminding users to enable email attachment indexing in their DMS next to the 'Process MSG (email) attachments' setting on the Processor configuration screen. This note hides/shows depending on whether MSG processing is disabled or enabled
6.3.9 -
- Regression bug fix - the hidden file bug fix from 6.3.6 got reintroduced in 6.3.7
6.3.6 -
- Bug fix - files that were marked as Hidden would be OCRed, but the conversion results couldn't be returned to the file system
- Bug fix - in some extremely rare instances, SOCR could generate a corrupted file (charset encoding issue)
6.3.5 -
- Added additional error trapping for corrupted MSG files (0x8004011b error code)
- If internals of MSG cause attachments to be inaccessible, the document record will now be put in the Inaccessible list (old behavior was to put it in the Reprocessing list)
6.3.4 -
- Added low disk space warning and error to backup manager. By default, these are set to 1.5 GB for warnings, and 1GB for error
- If disk space for backups drops below 'error' disk space level, documents will be moved to the Reprocessing list
- Levels can be adjusted by manually editing settings.xml and adding errorUsableSpace and warnUsableSpace parameters to the <backupManager .... /> element
- This check only occurs if backups are enabled
Here's an overview of the major changes:
6.3.2 -
- Add support for unlimited page count Trumpet licenses (P0 feature in the Trumpet license)
6.3.1 -
- UI improvement - MSG analysis was showing page X of Y of the previous PDF analysis status, even though there weren't any pages being analyzed
6.1.62 -
- SOCR installer was creating empty config, data, logs and work folders in the directory containing the setup executable
6.1.56 -
- Bug fix - some database states weren't reporting in the system status
- If database fails to open, it's state is switched back to Closed before throwing an exception
- Graph generation failed with exception trace if there was no data
6.1.55 -
- Bug fix - SOCR was hard coding the full path of the backup folder, instead of using relative paths
- Bug fix - Backup manager would fail to make backups if the user migrated the configuration from a 32 bit machine to a 64 bit machine - this now gets automatically corrected
6.1.53 -
- If we fail to open database on launch or on scheduled rebuild, we now do a hard fail - present an error dialog on screen, then kill SOCR (with exit code 999)
- If we fail to re-open the database after scheduled maintenance, we now do a hard fail
6.1.52 -
- When OCR results are returned to Worldox, post an audit trail entry (Save)
6.1.51 -
- If documents.lg file is corrupted, we now attempt to delete it
6.1.50 -
- Backlog throttling algorithm adjustments (undoing some of the 6.1.48 changes)
- SOCR now bases it's default reserve on an assumed license duration that is 13/12 of the actual license duration (for a 380 day license, this equates to an additional 42 days). This will cover cases where licenses are entered prior to the license start date, at the price of slightly higher initial backlog processing before throttling kicks in.
6.1.49 -
- Changes to document priority in Worldox and Folders screens now run as a background task with progress displayed in the standard background task frame
- Changes to the Processor Config screen that result in bulk changes to documents (moving Wrong Type to Analyzing, or moving unprocessed email message to Analyzing, etc...) now run as background tasks
6.1.48 -
- Backlog throttling algorithm adjustments
- If there is more than 30 days worth of data, SOCR will now dynamically compute the reserve capacity (130% of the average number of OCRable pages added over the past year).
- If there is insufficient data, SOCR will now reserve 3/4 of of future processing capacity for new pages
- The minimum default reserve capacity is 50 pages/day (this can be overridden using overclocking)
- Not a change, just a reminder: In all cases, if the reserve capacity isn't used on a given day, those pages become available for backlog processing
- Split up the Processor setting blocks (Backup retention and backlog throttling settings are now grouped separately)
- Changed wording on backlog throttling / processing capacity reserve settings (plus but the checkbox and pages/day input field on the same line)
- The counts for the pages added in the past year are now stored to disk so we don't have to compute them immediately on startup (they are refreshed once per day)
6.1.46 -
- If MsgEdit.exe doesn't return results within 60 seconds, we now destroy the sub-process, abandon the call and throw an error
6.1.45 -
- Bug fix - bulk operation buttons were showing by default, they are now hidden until the user clicks the Show Bulk Operations button
- Bug fix - Email messages that contained MSG attachments that had double quotes and periods near the end of their names, and in-turn contained attachments resulted in errors during processing
6.1.44 -
- Added new process priority level: Analysis Only (no OCR) - these documents will be analyzed and will stay in the Processing list, but will not be processed
- Changed label on Very High and High processing priorty to have "(no throttling)" at the end
6.1.43 -
- Bug fix - background tasks would disappear from UI before they finished running (only on some browsers)
6.1.42 -
- Added new setting to Worldox config (only in settings.xml, not UI): autoMapDisconnectedDrivesEnabled if true (the default), any disconnected drives are mapped. If false, no drive mapping will be attempted.
6.1.41 -
- Bug fix - Worldox connection was getting reset immediately after launch
- Tweaking display of pages left in Processor feature on License screen (displayed inaccurate data for Jan 1 resetting licenses)
6.1.38 -
- Bug fix - backlog throttling wasn't working properly. In some cases, it would allow runaway, unthrottled processing of backlog
- License detail screen now displays additional information about the license
- License detail screen Features list now gives info about how many pages have been used and how many are remaining
- Added columns to the CSV document export for priority, last modified
6.1.37 -
- Bug fix - sites that had UNC mapped profile groups where the UNC share was no longer valid would wind up with no PGs being found at all
6.1.36 -
- Added pages left readout to Processor config screen
6.1.35 -
- Documents with pages larger than a threshold (10,000x10,000 pixels by default) are now placed in a 'Too big' document list. The size limits are configured in the PreProcessor section of the setings.xml file - maxWidthPixels, maxHeightPixels
6.1.34 -
- Bug fix - processing files with huge numbers of pages (>1000) could result in 'CreateProcess error=206, The filename or extension is too long' error
- Processor and Analyzer will now allow up to 5 errors in a one hour period before pausing processing
- Handle 0x80030050 errors (STG_E_FILEALREADYEXISTS) - these files are now flagged as corrupted
- Handle 0x80030005 errors (STG_E_ACCESSDENIED) - these files are now flagged as restricted
- Handle 0x80004005, 0x800300fa errors - these are flagged as corrupted now
- Added 'Reason' to Needs Attention list
6.1.33 -
- the order that PGs are searched is now driven by the default priority assigned to that group (high priority searched before low priority)
- the order that folder searches are added to the finder task is driven by the default priority assigned to each root folder
6.1.32 -
- documents in CORRUPTED list will no longer be auto-reanalyzed after every update
- Corrupted MSG files were being put into the Email Messages list instead of the Corrupted list
6.1.31 -
- Document List views now support Background Tasks display for bulk operations (Delete, Change State, Change Priority)
6.1.31 -
- Bug fix - attachments with tiff extensions (i.e. 4 characters) were not being handled properly (continuously put back on REPROCESS list)
6.1.29 -
- Backup purge now removes files from work\backupfiles folder tree if those files aren't referenced by a Document record
6.1.28 -
- MsgEdit failed to write output file if attachment names contained unicode characters
- New Background Task sub-system has been added - currently integrated into Processor Config (and parts of Advanced and Scheduler Config)
- Backup purge is now implemented as a Background Task
6.1.27 -
- Bug fix - MsgEdit.exe crashes sometimes
- Sub-attachment names are now prefixed with the sub-email message name that they came from
- Msg working folder is now flushed each time we invoke MsgEdit.exe
- SOCR will attempt to connect network drives for Worldox profiles that have been disconnected
- WDAPI32.DLL will now be completely unloaded when we reset the Worldox connection
6.1.24 -
- Fix for 'Premature end of file' and 'Content is not allowed in trailing section' errors when processing MSG files (MsgEdit wasn't closing results.txt properly). MsgEdit 1.0.0.7
6.1.23 -
- If PDF was corrupted, but could be rebuilt during analysis, we now still mark the PDF as corrupted (error message is 'PDF is partially corrupted - but it can probably be repaired in Acrobat then resbumitted for processing'). These types of PDFs can't be modified in 'append mode' to place the invisible text layer, so it makes no sense to continue processing them (even though technically we can OCR them)
- Restricted documents now only go to the Encrypted/Restricted list if they would have otherwise been processed
- Digitally signed documents now go to the new Digitally Signed list if they would have otherwise been processed
- Digitally signed and encrypted state is now displayed in Document detail screen (if the document is signed and/or encrypted)
- Added list descriptions for a few document lists that were missing descriptions
- If document modified date is more than 1 day in the future, we process the document (instead of putting it in the re-process list) - we had some sites that had whacked modified dates (like 5 years in the future) on documents and SOCR was thinking that they were modified recently so kept putting them in the re-process list
6.1.19 -
- Better error message if something goes wrong during PDF analysis (include the filename and pagenumber)
6.1.17 -
- Add document name to warning logging when inline image parsing of pages in a PDF fails
- Added better description to the email related document lists (there was no description before)
- Better error handling if Outlook wasn't installed on the workstation (or isn't working for some other reason) - warning now appears in system status if there is a problem, AND MSG handling has been enabled
- We now have a new list for unprocessed email messages (MSG files go into here if MSG processing isn't enabled, or if Outlook isn't installed properly)
- When MSG handling is changed from disabled to enabled, SOCR will now mark unprocessed email messages and unprocessed email attachments for re-processing
- On launch, if MSG handling is enabled, SOCR will now mark unprocessed email messages and unprocessed email attachments for re-processing
6.1.14 -
- change installer - the Java bundle id download link is now BundleId=81819 (Java 7_u45)
- change installer - the Java installation now completely runs in silent mode - the user doesn't have to click through Java installation screens, and they aren't taken to a web site to test the Java install after it completes
- change installer - the Java installation is configured to NOT integrate with the web browser on the machine
6.1.13 -
- Message dialog when user launches SOCR multiple times is friendlier - and clicking OK on it displays the UI of the already running instance.
6.1.12 -
- First pre-release with MSG attachment handling
6.1.10 -
- Bug fix - sometimes after saving processor changes, analyzer would wind up not running
6.1.8 -
- Change labels on Processor config screen
- Adjusted 'Change state to' history messages to display the 'pretty name' instead of the CONTAINER, ERROR, etc... enum name
- Display attachment name in document detail screen
- Bug fix - when processing TIFF attachments, the attachment record in the parent document was still refering to the .TIF file extension - we now update the analysis results (which is where the attachment information lives) whenever we rename an attachment
6.1.7 -
- Bug fix - TIFF email attachments were staying with tif file extension even after being converted to PDF
- Bug fix - TIFF email attachments were being named 000000 instead of retaining the original name (with new extension)
6.1.5 -
- MSG files weren't being found in Worldox finders
- Eliminated 'allowedExtensions' setting in Worldox and Folder finder configuration - replaced with 'disallowedExtensions' - this isn't surfaced in the UI, but if there are firms that don't want us finding MSG files or what-not, we can set this
6.1.3 -
- Added 'Refresh' button next to Worldox PG list (allow users to see changes to the PG lists made since SOCR launched)
6.1.2 -
- Added 'Allow processing of email attachments' setting to Processor config
- Cleaned up Processor config UI a bit
6.1.1 -
- First build with support for MSG handling
6.0.17 -
- Bug fix - if the file in the underlying file system has no modified date set, database corruption could result when the document is added to the database. See ticket 20124 for details.
6.0.15 -
- Bug fix - if a database corruption occurs, and the user does a database reset (renamed documents.db and documents.lg) before a rebuild can happen, all future launches fail
6.0.14 -
- Bug fix - profile groups with base paths containing spaces could prevent Finder from working, even if the profile group wasn't selected for processing
6.0.13 -
- Changed Folder feature description to "Windows folder tree integration" (old description referred to 'processing' which could cause confusion)
6.0.12 -
- Bug fix - Warning wasn't showing if no Worldox PGs were selected
6.0.10 -
- Licensing no longer warns about unknown feature codes in new license format
6.0.8 -
- Bug - null pointer exception during page count calculations for old license type if Abbyy engine fails to initialize
6.0.7 -
- Move to 10.5.0.58b engine installer
6.0.6 -
- Added support for OCR of spanish and brazlian portugease documents
6.0.5 -
- Bug fix - older sites that were using the Auto Select PGs checkbox wound up with no PGs selected after upgrading. We removed the Auto Select PGs option when we moved to version 6, so a compatibility shim was needed for those sites
6.0.4 -
- Bug fix - heartbeats were reporting the OCR backlog size incorrectly (often times not reporting it at all)
6.0.3 -
- Bug fix - long running finder tasks were showing 'Waiting for other tasks to complete' when they were actually running
6.0.2 -
- Simple view of search summary screen (the bar graph progress dialog) no longer has hyperlinks on the cabinet paths (this was allowing users to easily get to a non-simple view of things)
Symphony 6.0 brings a major change to the workstation user interface. Here's an overview of the major changes:
6.0.1 -
- Moved all changes from versions 5.3.13 to 5.4.51 to version 6.
5.4.51 -
- Bug fix - page count and document count usage displayed in the heartbeats was flipping (pages,documents then documents,pages) every time a new document was processed.
5.4.50 -
- Added debug output to track when processor and pre-processor are started and stopped
5.4.49 -
- Tweak to Unavailable list text - changed to Moved/Unavailable
5.4.48 -
- Heartbeats now include the actual status (WARN, ERROR) before the details
5.4.47 -
- Added download URL for engine installer if auto-download fails
- Bug fix - in Worldox and Folder config screen, the 'apply to existing documents' checkbox were displaying initially, even in modern browsers
- Bug fix - in Worldox and Folder config screens, clicking the 'select all' checkbox had no effect
- Worldox API session IDs are now generated using current clock time - avoid issues with accidentally reconnecting to old (corrupted) WDAPI
5.4.46 -
- Bug fix - Worldox configuration screen could fail to apply changes with NullPointerException in rare situations
5.4.45 -
- Bug fix - setting changes could not be saved if config.xml file didn't exist
5.4.44 -
- Bug fix - Heartbeat sender was crashing SOCR on launch if license wasn't populated
5.4.43 -
- Updated to jWDAPI 20130508 - adding ability to detect when WDAPI doesn't load (vs has errors)
- Much better error message when we aren't able to load the Worldox API: "Unable to load Worldox libraries - please close Symphony, launch and close Worldox, then re-launch Symphony. Error details: Worldox API not initialized. Worldox must be launched and closed at least once for a given Windows login (this registers the Worldox programming interface). If you have recently updated Worldox, you may need to launch and close Worldox one time to get the update downloaded."
5.4.42 -
- Bug fix - heartbeat wasn't sending appid
- Switched to using new heartbeat post type (no client or partner ID)
- Added link to WD config screen from license error message (ticket 18458)
- Added 'Send Heartbeat Now' link to the License page (under Advanced section)
5.4.41 -
- Fix type-on in search summary screen
5.4.40 -
- Bug fix - analysis of some PDF files was showing incorrect image ratios (crop and media box extents issue)
- Bug fix - visible text outside the crop box was being included in the visible text counts - this text is now being excluded fro mthe visible text counts
- Added ability for SOCR to generate thumbnail images of pages that don't have one already (disabled by default)
- Added 'Generate Thumbnails' setting to Processor configuration
5.4.39 -
- S-OCR now tracks page counts based on the license renewal date (will take effect at the next license renewal)
- When the new page count system is active, the display of remaining processing capacity reflects the renewal date in MMM, yyyy format (or MMMM d, yyyy format if the license is for less than a year)
- Added New pages per year calculation to the welcome screen
- Added 'Recommended license capacity' to welcome screen if the current license isn't big enough to handle the backlog and one year's new documents
- Added explicit link for displaying All Weeks of the Backlog Summary screen
- Fixes a number of display issues with backlog throttling warnings and other error and warning messages
- Heartbeat now includes page usage summary for each year (only take effect when the new page count tracker is active, so it'll be awhile until we have good data on this)
5.4.38 -
- Add debug lines to troubleshoot issue with PG's not being found
- Remove config\TRE_settings.reg from installer
- Bug fix - an extra heartbeat sender was being created, and it was putting error messages in the log files
5.4.37 -
- SOCR now saves a backup of the previous settings.xml file (in the config\bak folder) every time the user changes settings. These backups are retained for 90 days.
5.4.36 -
- Change headings on document detail screen to have 'Control' separate from 'Details'
5.4.35 -
- Change label on Processor Config 'Performance' section to be 'Performance (since last restart)'
5.4.34 -
- Make summary screen progress bar show percent complete instead of percent remaining
5.4.33 -
- UI tweaks on summary screen
5.4.32 -
- Legacy schedule entries "RUNFINDER_QUERY", "RUNFINDER_SPIDER" are now discarded (previously, they would appear as "Operation (RUNFINDER_QUERY) not available" - because these particular tasks will never be available, we will discard them
- Improved look and feel of summary screen
5.4.31 -
- Bug fix - Heartbeats weren't sending on a regular basis (they only sent when the application launched). Heartbeats should now be sent once per hour.
5.4.30 -
- Added SYMPHONYANALYZER license type
5.4.29 -
- Bug fix - documents in very low and low priority weren't being processed at all (even if backlog throttling wasn't active)
5.4.28 -
- Bug fix - queue analysis was messing things up if file modified date was greater than when the queue analyzer was first created
- Bug fix - Show/hide bulk operations wasn't working in Internet Explorer
5.4.27 -
- Bug fix - huge PDF files caused out of memory exceptions during processing
5.4.26 -
- Bug fix - invalid document modified times could cause queue analyzer to mis-calculate
- Bug fix - errors during document mutator notifications could cause DB to actually be corrupted
5.4.25 -
- Ignore All is now available in the Processing list bulk operations
5.4.24 -
- bug fix - missing files could cause process summary graph to be improperly computed ( java.lang.ArrayIndexOutOfBoundsException: -1 error )
5.4.23 -
- Enabled heartbeat sending using the old heartbeat format (otherwise heartbeats aren't showing up)
5.4.22 -
- Search Summary screen now displays a progress bar for each profile group (or folder)
5.4.21 -
- Bug fix - if two web requests were hitting at exactly the same time, they could conflict and result in ClassCastException errors. This may also address problems where sometimes clicking a link didn't seem to always work. This problem has been in the code since forever - I'm glad we got it fixed
- Bug fix - document in NEW list on Tia's VM testing - we now reprocess anything in the NEW list for good measure
- Fixed two knowledge article links (they were pointing at admin.php instead of index.php)
5.4.20 -
- Bug fix - log file had error messages related to Scheduler during initialization
- Bug fix - Summary screen was only showing results for first 50 documents
- Added display of which list the Summary is for (added 'in Processing list' to the end of the search criteria)
5.4.19 -
- Bug fix /maestro/do/status wasn't showing the status level
- We now re-analyze any document in the DELETED list when SOCR launches (documents should never be in the DELETED list on launch, this is a cleanup from an earlier bug) - these document records will almost certainly wind up being on the UNAVAILABLE list.
5.4.18 -
- Show Bulk Operations button now changes it's text to 'Hide Bulk Operations' if appropriate
- Bug fix - the bulk operations were always showing in certain versions of Internet Explorer (even with Javascript turned on)
5.4.17 -
- Bug fix - installer was forcing Cleaner Recovery every time SOCR was updated (yuck)
5.4.16 -
- Bug fix - in document list, clicking Reanalyze on a document, then applying a filter to the nexts screen resulted in errors
- If installer is unable to clear out existing files, it now gives the user Retry and Cancel buttons (instead of just aborting the installation)
5.4.15 -
- Bug fix - loading pre-5.4 databases could result in a failed launch with error in logs: com.trumpetinc.maestro.MaestroApp - Problem during initialization - will attempt database maintenence when we launch again - -1757588944
5.4.14 -
- Bug fix - Reanalyze button on individual files was resulting in stack trace error
- If filter is specified as empty, or ending with a backslash, we now add an * to the end
5.4.13 -
- Removed Detail link from document list - details are now obtained by clicking on the filename itself
- Added 'Show Bulk Operations' button - clicking this displays a panel with the bulk operations on it
- Added 'Bulk Operations' pane with individual buttons for performing bulk operations on the filtered list results
- Changed 'View' to 'Open' for Worldox documents (will open the document in Worldox)
- Re-arranged per-document operation buttons so they lay out nicer for Folder and Worldox sourced documents ('Open' is now at the end of the list)
5.4.12 -
- Redesign Folder configuration screen - individual items can be enabled/disabled, added default priority setting, simplified adding new folders
- Added Default Priority to Worldox configuration screen
- Added 'Reprioritize existing documents' checkbox to Folder and Worldox configuration screens (this appears when the priority is changed)
- Added View Summary link to Folder configuration screen
- Added What's this section to Lookup by Path screen
5.4.11 -
- Advanced config option: New webServerConfigration -> listenPort setting - controls which port the internal SOCR web server will listen on. Default is 14722.
5.4.10 -
- All links in the "What's this" sections of each page now open in separate tabs
- Fix capitalization of Processor, Analyzer and Finder in scheduler task names
5.4.9 -
- Lookup By Path - added Tip line
- Scheduler - removed Run Now button (doing that feature properly would require a lot of work - not worth it right now)
- Analyzer config screen, changed 'Settings' to 'Advanced Settings'
- In Licensing screen, Change Analyzer feature to just read 'Analyzer' and Processor feature to just read 'Processor'
5.4.8 -
- Scheduler will no longer schedule tasks that the license doesn't allow (prior, if a task was configured, then the license changed, the task would still be scheduled for execution)
- Scheduler Configuration screen only displays schedule entries that are for tasks that are allowed by the license
- Scheduler Entry Edit screen now only displays task types that are allowed by the license
- Added bulk priority change buttons to document list screens (not sure if this is the best UI for this...)
- Bug fix - Search Criteria in Search Summary page weren't showing the search description
- Search Criteria in Search Summary page can be clicked to get a list of documents meeting that criteria (i.e. all documents in a particular PG) - this will eventually allow users to adjust priority on all documents in a PG, for example
5.4.7 -
- Change 'read only' to 'read-only' in WD config screen
- Added Save button to Worldox basic settings section
- Save button in Processor and Analyzer screens are now below the Settings areas (consistent with other screens)
- Scheduler list now has a 'Delete' button for removing schedule entries. The Edit Schedule Entry screen no longer has a 'delete' button
- Bug fix - "Client ID not set" error on clean install
- Configuration file settings:
- documentProcessor autostart="true" - if 'false', the processor will not start when S-OCR starts (useful for troubleshooting)
- documentPreProcessor autostart="true" - if 'false', the analyzer will not start when S-OCR starts (useful for troubleshooting)
- MaestroConfig checkDatabaseIntegrityOnLaunch="false" - if 'true', SOCR will check the database integrity as it launches (problems are logged as fatal errors to the maestro.log file, an error dialog will appear on screen and the launch will fail) - this slows the launch down, and shouldn't be used in production, but it would be very good to have this turned on in our test environments
- Bug fix - caption on Search Summary screen said 'Search Summary for Search Summary' - it now properly says the name of the search summary (e.g. Worldox profile groups)
- Ability to change individual document priority from the document list
- Bug fix - backlog calculation wasn't looking back 5 days (for all intents and purposes, anything older than today was considered to be part of backlog)
- "Backlog throttling active" warning message in Processor configuration screen now displays the actual date of the backlog cutoff
- Processing and Analyzing lists now have headings for processing priority groupings
5.4.6 -
- Bug fix - Scheduler is coming up without default entries on clean installs
- Date column in Processing and To Analyzing lists displayed the last time the file was processed. For these two lists, this column will now display the modified date of the file.
- Added mechanism for controlling processing priority (Very Low, Low, Normal, High and Very High) - documents are grouped by their processing priority (and ordered by file modified date inside each priority group). Groups Very Low and Low are always considered to be part of the backlog. Group Normal documents are part of the backlog if they are older than 5 days.
- Detail: Database rebuild is required by processing priority implementation - this will happen automatically the first time the DB is opened
- Worldox PG analysis results can now be obtained by clicking the new View Summary link in the PG selection list
5.4.5 -
- Bug fix - adding an emty string as a folder gave a null pointer exception
- Added hyerplink for support docs to Folder config screen
- UI improvements to monitored folder add screen
5.4.4 -
- Bug fix - clicking on View on a document that didn't originate in Worldox caused Worldox to try to view that document. The View button is no longer displayed unless the document originated in WD.
5.4.3 -
- Bug fix - warning messages related to folder processing were showing up in the Finder screen, even though Folder processing wasn't enabled by the license.
- Bug fix - If license disallowed Folder Finder, then a new license was entered that allowed Folder Finder, the Folder Finder did not get turned on
5.4.2 -
- Bug fix - old Worldox selected PG configuration wasn't being carried over into the new system
- Bug fix - "0 is 0 or negative" error when analyzing zero sized PDF files - these now get properly flagged as being corrupted
- Bug fix - the warning "No Worldox profile groups are selected" error shouldn't display "Configure Worldox" link when the user is on the Configure Worldox screen. Same for folder screen.
- Bug fix - during installation, errors pop up about not being able to configure firewall exclusions - for now, we are going to remove this from the installer - it's not worth the hassle if it's not going to work robustly
5.4.1 -
- Bug fix - backlog throttling was activating improperly on short trial licenses because it was using december 31 of the current year as the license reset point in the calculations - Now, if the license duration is less than 90 days, the backlog throttling algorithm will use the expiration date of the license itself (instead of Dec 31 of the current year)
- Installer will now configure Symphony OCR rule in windows firewall
- Breaking change: maximum file modifed date cutoff in the Finder is no longer honored - any sites that have this value set will start 'finding' older file. This does NOT impact the maximum file age setting in the processor. Note that this setting was removed from the UI awhile back (unless the user had it actually set)
- Breaking change: Auto select PGs has been removed - we should probably have users check the selected PGs after they apply this update
- Launcher has additional error messages (Should provide more feedback in some cases when the Java runtime is corrupted)
- Finer grained history notes if document is inaccessible
- Breaking change: It is no longer possible to force a document record to be created by typing it's path into the Search By Path dialog - if the document doesn't exist in the database already, it will no longer be created. See http://forum.trumpetinc.local/viewtopic.php?f=2&t=905&p=7271#p7271
- Major new feature: Ability to page through list results
- Major new feature: Ability to filter list results. All operations like Reanalize All, Ignore All will apply to the filtered list
- Added a new document list - Unavailable for documents that have their document source (worldox, folder, etc...) become unavailable (this can happen b/c of licensing, b/c the source is offline, or if the configuration of the source no longer includes the document - i.e. if the user changes the selected PGs in the Worldox DMS source)
- Major new feature: Support for new Folder DMS types
- Complete overhaul of Finder implementation to support pluggable DMS types (these are called Document Sources)
- Finder will now check existing document records for reprocessing, regardless of which list they are in (this check used to only be made against the document modified date - now it is made against the modified date AND file size)
- Uncommon document lists are now shown under a 'Other Lists' heading, if those lists have any documents
- If the analyzer or processor finds that a document is unavailable to the DMS (i.e. file was deleted, DMS is offline, etc...), those documents are moved to UNVAILABLE (they used to be moved to DELETED) - see
- New feature: Ability to run a scheduler task by clicking 'Run now' in the scheduler configuration screen
- Complete overhaul of how the scheduler works - we now run tasks at a specific time, instead of running at a certain frequency between a start and stop time
- Breaking change: some existing scheduler entries will show as "Operation (XXXXXX) not available" - this will definitely happen for RUNFINDER_SPIDER and RUNFINDER_QUERY, which are no longer part of the scheduler
- Errors when unable to connect to Worldox are more readable
- Moved to new license feature scheme (based on letter codes - see http://forum.trumpetinc.local/viewtopic.php?f=2&t=885&p=7154&hilit=symphony+ocr+license#p7192 )
- Bug fix - documents could wind up in a DELETED list
- Bug fix - retained backups weren't being purged for all documents (if documents weren't on the PROCESSED list, they wouldn't have backups purged). Now the purge considers all document lists.
- The version of S-OCR that actually processed a document is now embedded in the PDF. If we re-analyze the PDF, that version will be displayed in the document details under the 'Marked' heading
- S-OCR now uses 'append mode' when adding invisible text to PDF files. In append mode, the changes to the PDF are added to the end of the original PDF file. This makes it possible to completely recover the original PDF by just deleting the last XXX bytes off the file. Testing shows that the file size is not adversley impacted by this (in many cases, the resulting file is actually smaller)
- Bug fix - SOCR was OCRing documents that had been configured to allow Adobe Reader Form Filling (in the process of doing this, the form is no longer savable by Reader). The Analyzer will now detect this and move the file to encrypted/restricted.
- Move to itext-5.4.1-20130310.jar - adds ability to see if a PDF has usage rights (Reader form filling enabled)
- Complete overhaul of the web pages that make up the UI - trying to make them much more consistent
- Change in behavior when processing TIF files - we now preserve the document record (we just change the document path in the record). In the past, we created a new Document record for the resulting PDF, and left the record for the TIF file hanging around. This meant a lot of bookkeeping with keeping track of which document got converted to which other document. This has all been stripped clean.
- Document backup restore has been rewritten so the restored document record is preserved if the file extension is different (we used to create a new document record)
5.3.13 -
- Bug fix - cleaner was not cleaning files that had been split by Symphony Profiler. This has now been fixed - any documents in the Processed queue will be re-analyzed and re-cleaned again
5.3.12 -
Added logs/cleaned_files.log output with details of every file that is cleaned
5.3.11 -
Critical bug fix - Symphony OCR was misplacing OCRed text on pages that had to be rotated during processing. In most cases, the text was placed completely off the page, making it possible to search *for* a document, but not search within the page (or copy text from the page). This issue was introduced by a change in a 3rd party library on 6/20/2012 in S-OCR version 5.2.58. This issue is now fixed.
After applying this update, any site currently running 5.2.58 through 5.3.8 will automatically enter a special recovery mode. In this mode, all documents processed since 6/20/2012 will be investigated to see if any pages have text that lays outside the visual page boundaries. If so, the invisible text on those pages will be removed, and the page will be re-OCRed. This operation will not count against the site's annual page processing count.
The cleaning operation can be triggered manually by re-analyzing any document in the Processed list. If the document is identified as having the problem, it will be moved to a new Backlog list named 'Cleaning'. A special module ('Cleaner') will process this list. The Cleaner module can be manually stopped and started from the Advanced screen.
5.3.10 -
Bug fix - Analyzer was identifying pages that had text right on the edge as being candidates for cleaning
Automatically give the client "bonus" processing capacity for any pages that we wind up cleaning
Heartbeat now includes # of bonus pages for the current year (if > 0)
5.3.9 -
Critical bug fix - Symphony OCR was misplacing OCRed text on pages that had to be rotated during processing. In most cases, the text was placed completely off the page, making it possible to search *for* a document, but not search within the page (or copy text from the page). This issue was introduced by a change in a 3rd party library on 6/20/2012 in S-OCR version 5.2.58. This issue is now fixed.
Any site currently running 5.2.58 through 5.3.8 will automatically enter a special recovery mode. In this mode, all documents processed since 6/20/2012 will be investigated to see if any pages have text that lays outside the visual page boundaries. If so, the invisible text on those pages will be removed, and the page will be re-OCRed
The cleaning operation can be triggered manually by re-analyzing any document in the Processed list. IF the document is identified as having the problem, it will be moved to a new Backlog list named 'Cleaning'. A special module ('Cleaner') will process this list. The Cleaner module can be manually stopped and started from the Advanced screen.
For most sites, the automatic recovery mode should take care of everything without the user involvement
Added automatic detection of files that had misplaced OCR text.
5.3.8 -
bug fix - If pages had to be rotated (CW, CCW or upside down scans) during processing, the resultant invisible text was not being placed properly on the page (in many cases, the text is entirely off the page). Introduced in version 5.2.58 (when we moved to Abbyy 10). This fix does not address documents that have already been scanned - we are working on that.
5.3.7 -
Bug fix - Analyze Worldox Profile Groups was presenting totals for all documents, not just the OCR backlog
5.3.6 -
Advanced screen now has Analyze Worldox Profile Groups command - presents a summary table with the total number of documents and pages in each profile group
5.3.5 -
Added support for files with really long filenames (>260 characters) (due to limitations in Worldox, files must still have 8.3 filename equivalents)
5.3.3 -
Added /wait=X command line argument (wait number seconds) - /wait=5 will wait 5 seconds before really launching the application
5.3.2 -
Bug fix - small corruption in library caused S-OCR to crash in some rare cases (legacy profile groups)
Updated jWDAPI.jar and jWDAPI.dll - 20121005
5.2.77-
Added additional 80004005 message checks: "This image file format is not supported", "Unknown error while opening"
Bug fix: Files were being left behind after processing, causing the Symphony PC's disks to fill up - this was introduced in 5.2.58 - recommend that any site on 5.2.58 or higher update
5.2.76 -
Error 0x80004005 with error output of "Invalid PDF file" or "PDF data is corrupted" now causes the document to get moved to the Corrupted list, instead of the Needs Attention list
5.2.75 -
Bug fix: Crashes and documents winding up in Needs Attention list when processing really big files (> 1000 pages)
5.2.74 -
Bug fix: Some errors during OCR could result in an on-screen crash dialog (MaestroOCRProcess has encountered a problem and needs to close). This dialog prevents further processing until Close is clicked.
5.2.73 -
/maestro/do/status screen now includes a line that says whether the backlog is "large" or not (>2000 pages)
5.2.72 -
Bug fix: some documents that were prevented from being modified by NTFS security weren't being moved to the INACCESSIBLE list (they were being placed on the REPROCESS list in a continuous loop)
5.2.71 -
If we fail to open a file for analysis (i.e. b/c of file system security), we now discard previous analysis results (this will, hopefully, trap the case where document security was changed AFTER we did analysis - right now, we are using cached analysis results, so we don't see that the security has changed)
Bug fix: In some older sites, the processing priority for a document was set to 0 - this caused an infinite loop that prevented those documents from ever getting processed
5.2.70 -
Bug fix: estimated time to process backlog calculation displayed incorrectly if the time was less than a day, but more than an hour
5.2.69 -
Bug fix: TIF files were being placed in the Needs Attention list with error "Adding text to image failed - ASDFHKWSEEWQI\Trumpet\Symphony.....\image_1.tif not found". Also, Processor could show error "TifImagePath is empty" errors.
5.2.67 -
Bug fix: Text content of files OCRed *after* they were picked up by the text indexer weren't being added to the text indexes during nightly rebuilds
5.2.66 -
License status will now show an error if the license doesn't specify the allowed number of pages
5.2.65 -
Fix null pointer exception when displaying Processor config in some corner case scenarios
5.2.64 -
Bug fix - null pointer exception if document factory was closed due to an error, then Symphony is quit (this could prevent Symphony from quiting)
Bug fix - Compact database now purges document records that have the same path
5.2.63 -
Bug fix - StringLong key would have exceeded 300K error messages in logs
New /maintenance command line argument - forces a full compact of the database as S-OCR launches (may be useful in cases where database corruption prevents S-OCR from launching)
5.2.61 -
Improved OCR status messages - they will now display the page that is being worked on (if a page number is available)
5.2.60 -
Bug fix - [0x80020005] - ERROR: Type mismatch error when processing documents that contain barcodes
5.2.58 -
Bug fix: "Comparison method violates its general contract!" error when analyzing some PDF files
Changed MaestroOCRProcess.exe to be SymphonyOCRProcess.exe
5.2.56 -
Bug fix - when compacting database, if finder process was running, the system would lock up after the compact completed
5.2.55 -
Added Compact Database scheduled task to scheduler - by default it runs at midnight on Wednesday
5.2.54 -
Added database status to heartbeat
Added database rebuild progress messages to the advanced screen
5.2.53 -
Heartbeats will be sent even if license is invalid
Heartbeat includes information about license expiration, etc...
Changed license configuration screen so error and warning messages are more prevalent
Added a license warning (to the user interface and the heartbeats) if the license will expire in the next 30 days
Bug fix: If no PGs are selected, system status should show ERROR
5.2.51 -
Fix for "TIFF files winding up in Needs Attention" list - special handling for old-style TIFF files that use an unsupported JPEG compression format
5.2.50 -
Make sample image have modified date of 'today'
During install, we now log the version of installer to Trumpet-UpdateHistory.txt
Admin Guide link now points to Trumpet knowledge book instead of PDF
If the Windows user doesn't have sufficient permissions for OCR engine to work, we now produce meaningful error messages
5.2.48 -
Better handling of partially corrupted database indexes during rebuilds (should fix Java memory/heap space errors on rebuilds in some situations)
5.2.47 -
Added 'Ignore Specified' button to Advanced screen (allows bulk setting of Ignore items)
5.2.46 -
REPROCESS queue is automatically re-processed on launch
Advanced screen addition - you can now put in a bunch of file paths and click 'Process Specified Documents Immediately' to get them all to process at higher priority than other documents. If the document is NEW or REPROCESS, it'll get moved to the PREPROCESS (analysis) queue, otherwise it stays in the queue it's in already (but with adjusted priority). If the document is in a Post-Process document list, nothing will happen to it.
To Analyze queue is now ordered by 'Process Priority' (instead of when the document was found)
Improvements to user interface responsiveness (trying to eliminate problems where the user clicks a link, and nothing happens)
Adjusted labels on welcome screen to indicate that times to process backlog are *estimates*
5.2.45 -
Corrupted documents will now be reprocessed immediately after applying a code update
Added a new /maestro/do/status page that gives a plain text status of the system (useful for parsing by monitoring software)
5.2.44 -
Fix for "Colorspace not supported" and "Color depth not supported" corrupted PDF warnings
5.2.43 -
Fix for database index corruption issues (this update will trigger a rebuild of the queue metric tracker)
Optimize performance when doing bulk document state changes
Fix for 'Dictionary key is not a name' corrupted file issue
Adjusted HTML layout of UI screen so it renders properly on old versions of IE
5.2.42 -
clicking on backlog summary graph gives a larger image with all weeks (instead of just 104 weeks)
Fixed 'Unsupported color space' PDF corruption errors
5.2.40 -
Maintenance screen now refreshes once per second
Maintenance screen now has better progress messages
Added an animation to the maintenance screen
5.2.39 -
Adjusted graph y-axis so it automatically displays a more pleasing scale
Display a blank graph if no data is available
5.2.38 -
Fixed issue with 0 sized files showing ArrayIndexOutOfBoundsException: 0 error
5.2.35 -
Adjustments to Check for Updates to allow user to get latest production or pre-release updates
5.2.34 -
Fix 'Page N is corrupted - NNN' message on files in the corrupted document list
5.2.30 -
High performance support for huge files (>2GB)
5.2.29 -
Bug fix - Interuptions to pre-processing were causing the document to show as corrupted - the document will now be moved to Reprocess
Bug fix - exceptions during processing resulted in file being left in In Process queue
5.2.28 -
We have found a regression issue introduced in version 5.2.26 that can (in some rare cases) result in corrupted S-OCR database indexes. This corruption can result in S-OCR refusing to launch. We recommend that anyone currently running 5.2.26 or 5.2.27 update to 5.2.28 immediately.
Summary
- Resolved issue with the "Uninstaller" not being available when installed as a logged-in user
- Changed NetDocuments OAuth token handling to support NetDocuments OAuth changes
- Added support for firms using proxy servers that use private certificate authorities to proxy HTTPS traffic. Custom certificate authorities can now be registered in a Java Key Store file stored at /config/cacerts.private.
- Updated to private Java Runtime, based on the open source Liberica JDK
- Resolved issues with SharePoint integration including:
1) Improved user count detection
2) Added more informative error logging if SharePoint API fails
3) Better handling for search results with 'null' file sizes
- Resolved issues with Aspose and Nuance PDF files that were not able to be safely processed by Symphony OCR
- Resolved issues with Java Heap Space crashing Symphony OCR
To see a complete list of changes, visit: Change Log
Summary 6.6:
Summary 6.4
Summary 6.3
Symphony OCR 6.0 brings a major change to the workstation user interface. Here's an overview of the major changes: