Home → Guides :: Symphony OCR → Not Processed List Description → Corrupted
Corrupted Documents - Documents in the corrupted list are those that Symphony OCR does not recognize as valid files. The most common reason is that the file is an invalid or corrupted PDF (try opening in Adobe to be sure). Another possibility is that there is some characteristic of the PDF that the Symphony OCR parsing algorithm isn't handling properly. Trumpet does periodically update the PDF parsing algorithms to address corner cases that have not been encountered before.
What to do?
Try opening the file in Acrobat, then hit Save (Acrobat will try to open and auto-repair corrupted files - when you save the document, it will save uncorrupted). After saving and closing the document, click the Re-Analyze button on the document record in Symphony OCR. This will only work if the file is only lightly corrupted, but is worth a shot.
If that doesn't help, next check to see if the file is already text searchable (i.e. can you search for text inside the PDF already?). If you can, then the document isn't a candidate for OCR anyway, and you can just move the document to the Ignore list.
If the document does need to be OCRed, and the Adobe repair doesn't help, then you may want to submit the document to us for analysis. Open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload the document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch.
If there are a large number of files that have the same corruption reason, and the files don't appear to actually be corrupted, please open a support ticket by emailing support@trumpetinc.com and we will send information on how to securely upload a sample document to us. If we find a problem in our parsing algorithms, we'll fix the issue and get you a patch. Alternatively, you can use a bulk Ignore operation to move the documents to Ignore.