OCR Confidence Level -1 (Microsoft OCR engine)

I have been testing the OCR engines, I am using th eintelligentOCR’s Data Extraction Scope which returns ExtractionResults. As part of that object, I am able to see the field’s confidence level by looping through the extractedData.ResultsDocument.Fields, then the item.Values(0).OcrConfidence returns the confidence that the OCR engine extracted correctly. It works perfectly fine with GoogleOCR, but when I started testing MicrosoftOCR, it is able to find the field but it kept on returning -1 as the OcrConfidence.
Using GoogleOCR

Using MicrosoftOCR

Can someone please explain to me why and how can I work around with that, just in case that it is an open issue? Thanks :slight_smile:

Hi @cherose

This is interesting. Any chance you could provide a zip of the project that reproduced the issue?

Sorry, but I can’t… but I can provide the flow :slight_smile:

  1. A document(.pdf) is being processed in “Digitize Document” with Microsoft OCR (Properties: Language = “da”; Profile = “Scan”, Scale=2) → this will return the DOM which has the ocrConfidence of -1 already…

  1. then for further processing, “Load Taxonomy” → “Data Extraction Scope with Simple Document Data Extraction Activity” that returns the ExtractionResult, which is still -1 of course since it has been derived from the DOM.

I hope you could still help me understanding it with those details. It is pretty much the same flow that I used for Google OCR which it actually returned reasonable OCR confidence levels.

Thank you.

Hi @cherose,

The Microsoft OCR engine that we are using under the hood of our activity does not return any confidence information for us to pass on, as opposed to the Google OCR. Consequently, we set the confidence to -1, meaning “Unknown”.


Thank you for your reply :slight_smile:
That is all I need, a clarification on why Microsoft OCR is different from the others despite of the same processing.


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.