Accuracy in OCR

ocr
studio

#1

Hi,

I am using Microsoft OCR to read some names from an application running in Citrix environment. I could read the names but the accuracy is not as expected. For example, if the name is Balchandran, it is interpreted as Balehandra and Diiaya as Duava.

We are having scale as 1. Is there any way to improve the accuracy?

Thanks in advance.


#2

Use google ocr and check @DineshManivannan


#3

Tried and seems that Microsoft is little better than Google.


#4

You will need to play with the zoom of the text and the scale. Since the ‘n’ and ‘y’ were cut off, your scale is probably slightly too small or the text is too large for that scale. I could be wrong, but in general, the OCR tries to fit each character in a box (identified by scale) so if the letter doesn’t fit in the box correctly, it chops it up and sees a portion of certain characters.


#5

I’ve also noticed that OCR tends to struggle with names. It usually does much better with dictionary words.

As ClaytonM mentioned, you’ll have to keep playing with the settings over and over again until you get results you want.


#6

In addition to comments above, some other ideas … For purposes of ‘automation’ it is obviously not an ideal solution to continuously/subjectively ‘play with the settings over and over again’ on a case-by-case basis.

I’ve found better results running the google/microsoft OCR after first running the image through a graphics processing software to sharpen the text (irfanview/xnview are free) which is an easily automated function, if you’re willing to do some additional leg work. Ultimately, I’ve had good results using Adobe Acrobat Pro to correct scanned docs / OCR to selectable text … it will never be 100% accurate, but it is a sufficient intermediary for UiPath to identify pages / certain keywords/text hooks that are needed to solve the problem (and create an automated solution accordingly) …

…I would never overstate the accuracy of OCR technology, it is an aid, but not an all-in-one solution …if you have an example (like a few screenshots of the citrix text) we may be able to offer more