Home → Guides :: Symphony OCR → Troubleshooting Tips and Tricks → OCRed documents not text searchable
Symphony OCR converts image only PDF files into text searchable PDF files. Once the PDF file contains text, users can immediately search *within* the PDF. However, the Worldox text indexer must update before the text contents of the PDF become available for text-in-file searching (i.e. searching *for* the document).
We recently found an interaction problem between Symphony OCR and the Worldox indexer (WDINDEX) that prevents some PDF files from being added to the full text index.
WDINDEX maintains a local cache of the text it extracts from documents. This allows Worldox to use the cached text extraction during text database inits - instead of re-parsing text from every file on the network. This local text cache dramatically improves text database rebuild times. WDINDEX determines whether it should use the text cache for a given file by checking the network file's modified date. Symphony OCR preserves the file's modified date when it performs OCR. This can result in WDINDEX using cached text extraction (consisting of no text) instead of the updated version of the file on the network.
In version 5.2.67, Symphony OCR now modifies the date of files it OCRs by a single minute. This will ensure that WDINDEX will not use cached text extraction results for files OCRed after S-OCR was updated to 5.2.67.
For files that were OCRed before the update to 5.2.67, the user will need to purge the WDINDEX text cache - this is easy to do, but can take a bit of time, depending on the size of the document repository. Here are instructions:
The next time WDINDEX rebuilds the text databases, it will rebuild the local text cache (this could cause the first rebuild to take longer than normal). After that, everything should just like before - except that all of your OCRed files will be text searchable.