HomeGuides :: Symphony OCRTroubleshooting Tips and TricksOCRed documents not text searchable

19.17. OCRed documents not text searchable

Background

Symphony OCR converts image only PDF files into text searchable PDF files.  Once the PDF file contains text, users can immediately search *within* the PDF.  However, the Worldox text indexer must update before the text contents of the PDF become available for text-in-file searching (i.e. searching *for* the document).

We recently found an interaction problem between Symphony OCR and the Worldox indexer (WDINDEX) that prevents some PDF files from being added to the full text index.

Cause

WDINDEX maintains a local cache of the text it extracts from documents.  This allows Worldox to use the cached text extraction during text database inits - instead of re-parsing text from every file on the network.  This local text cache dramatically improves text database rebuild times.  WDINDEX determines whether it should use the text cache for a given file by checking the network file's modified date.  Symphony OCR preserves the file's modified date when it performs OCR.  This can result in WDINDEX using cached text extraction (consisting of no text) instead of the updated version of the file on the network.

Resolution

In version 5.2.67, Symphony OCR now modifies the date of files it OCRs by a single minute.  This will ensure that WDINDEX will not use cached text extraction results for files OCRed after S-OCR was updated to 5.2.67.

For files that were OCRed before the update to 5.2.67, the user will need to purge the WDINDEX text cache - this is easy to do, but can take a bit of time, depending on the size of the document repository.  Here are instructions:

Purging the WDINDEX local text cache

  1. Log on to the Indexer PC
  2. Make sure you have updated to Symphony OCR 5.2.67 or higher - (see the update instructions if you need them)
  3. If WDINDEX is counting down, click Close Server to return to the main WDINDEX configuration screen
  4. Select the first drive in the Active Drives list
  5. Click Drive->Purge Local Cache:
  6. When prompted to delete the local text cache data, click Yes
  7. After the purge finishes (this could take 5 to 20 minutes), select the next drive in the Active Drives list and repeat the above procedure

 

The next time WDINDEX rebuilds the text databases, it will rebuild the local text cache (this could cause the first rebuild to take longer than normal).  After that, everything should just like before - except that all of your OCRed files will be text searchable.

This page was: Helpful | Not Helpful

© 2012 Trumpet, Inc., All Rights Reserved