Keyword Based Classifier - how is confidence calculated?

postwick · March 9, 2025, 7:34pm

I have keywords for a particular document type set to:

“general ledger debit”, “account title”, “date prepared”, “offsetting acct”, “originating center”, “investment number”, “prepared by”, “approved by”

When I look at the document text as generated by Digitize Document, it contains all those phrases except “account title”:

Yet it gives a confidence of 55.86.

How is 7 of 8 phrases appearing only a confidence level of 55.86? The OCR confidence is .29 so does that figure into the overall confidence? Considering this document was so accurately OCR’d why is the OCR confidence only .29?

postwick · March 9, 2025, 7:58pm

I thought maybe the issue is because the PDF actually contains two images - the front and back of a check, and the back is often very low quality containing large areas of black because of poor scanning.

So I tested with just the check front as JPG. The OCR Confidence is now .83 but Classify Document Scope isn’t giving me any classification result. This makes no sense. Documentation says Classify Document Scope can do JPG files.

sagarmaruti.mohalkar · March 27, 2025, 5:15pm

Hi @postwick

A keyword-based classifier functions by matching the items in its keyword set (provided by you) with the document text. I have encountered confidence issues on multiple occasions when using keyword-based classifiers.

I recommend using an intelligent keyword classifier, as it performs optimally and automatically extracts keywords from the sample documents.

Happy Automation!!!

postwick · March 28, 2025, 9:36pm

We can’t use intelligent keyword classifier (yet). I found that the issue is because the keyword based classifier cares how close to the top of the document the keywords are. So you shouldn’t use it to look for keywords throughout the document, only text (preferably a consistent header) at the top of the document.

Topic		Replies	Views
How to increase confidence level of Keyword based Classifier AI Center question , ai_center	3	1086	February 15, 2023
How to increase the Minimum confidence in DU Classifiers? Studio studio , question , picture_in_picture	1	106	June 29, 2024
Keyword based classifier not working in my case Activities activities , question , document_understanding	7	1313	September 29, 2021
How to improve the Confidence of the Keyword Classifier or Intelligent Keyword Classifier Document Understanding	5	997	September 19, 2022
Keywork based Classifier could not classify images Document Understanding question , document_understanding	1	145	June 26, 2024

Keyword Based Classifier - how is confidence calculated?

Related topics