OCR Question

I am starting to work in UIPATH. I need your help!! I have many files (.tif) (unstructured documents) where I need to obtain 4 values: date (Day- Month - Year, eg: 05 - July - 2019), number (Ex: US 1,200,350), number in words (Ex: one million two hundred thousand three hundred and fifty) and a name (Edward Rusell Federer). I solved the problem only for some documents using OCR on the full page and regular expressions to get these values, but the problem is that some documents have some graphic elements (lines, tables) that cause the OCR sometimes fail and close the robot.
Is it possible to select a specific rectangle / area in the image and use OCR only in this area? I did that, but the problem now is that the rectangle is not always in the same position. This area could be moved in each document (image) depending on the number of words and extra lines included in the document, but there are two tags/labelss (uniques in the document) that define this rectangle/area. My question, is it possible to find both tags and get the coordinates (x1, y1) and (x2, y2) and then select this area using both coordinates and apply OCR, because I know that inside the rectangle are the 4 values that I need?

I think I could use Find OCR Text Position, clipping region, relative scraping but I do not know how to do it.

Any suggestion will help me.

Sorry for my English. English is not my native language.

@esoccl hope this will help you!

2 Likes

Thanks!!. I need to purchase license for ABBY?

hi @esoccl

I do suggest if you have multiple format of pdf files please use the abbyy flexi but if it is just simple obtaining of value in pdf just use some activities in uipath and it really works!

:smiley::smiley::smiley:

1 Like

You can use one of the below options in case you do not want to invest in ABBYY.

  1. You can use Taxonomy manager and define your template and then digitize document and then indicate your elements using validation station. Nut you have to indicate your fields manually everytime.
  2. You can use the new document processing model. Receipt and Invoice AI - Now available in Public Preview!