I have been testing the OCR engines, I am using th eintelligentOCR’s Data Extraction Scope which returns ExtractionResults. As part of that object, I am able to see the field’s confidence level by looping through the extractedData.ResultsDocument.Fields, then the item.Values(0).OcrConfidence returns the confidence that the OCR engine extracted correctly. It works perfectly fine with GoogleOCR, but when I started testing MicrosoftOCR, it is able to find the field but it kept on returning -1 as the OcrConfidence. Using GoogleOCR
Using MicrosoftOCR
Can someone please explain to me why and how can I work around with that, just in case that it is an open issue? Thanks
A document(.pdf) is being processed in “Digitize Document” with Microsoft OCR (Properties: Language = “da”; Profile = “Scan”, Scale=2) → this will return the DOM which has the ocrConfidence of -1 already…
then for further processing, “Load Taxonomy” → “Data Extraction Scope with Simple Document Data Extraction Activity” that returns the ExtractionResult, which is still -1 of course since it has been derived from the DOM.
I hope you could still help me understanding it with those details. It is pretty much the same flow that I used for Google OCR which it actually returned reasonable OCR confidence levels.
The Microsoft OCR engine that we are using under the hood of our activity does not return any confidence information for us to pass on, as opposed to the Google OCR. Consequently, we set the confidence to -1, meaning “Unknown”.