How is 7 of 8 phrases appearing only a confidence level of 55.86? The OCR confidence is .29 so does that figure into the overall confidence? Considering this document was so accurately OCR’d why is the OCR confidence only .29?
I thought maybe the issue is because the PDF actually contains two images - the front and back of a check, and the back is often very low quality containing large areas of black because of poor scanning.
So I tested with just the check front as JPG. The OCR Confidence is now .83 but Classify Document Scope isn’t giving me any classification result. This makes no sense. Documentation says Classify Document Scope can do JPG files.
A keyword-based classifier functions by matching the items in its keyword set (provided by you) with the document text. I have encountered confidence issues on multiple occasions when using keyword-based classifiers.
I recommend using an intelligent keyword classifier, as it performs optimally and automatically extracts keywords from the sample documents.
We can’t use intelligent keyword classifier (yet). I found that the issue is because the keyword based classifier cares how close to the top of the document the keywords are. So you shouldn’t use it to look for keywords throughout the document, only text (preferably a consistent header) at the top of the document.