OCR Question

esoccl · July 15, 2019, 3:40pm

I am starting to work in UIPATH. I need your help!! I have many files (.tif) (unstructured documents) where I need to obtain 4 values: date (Day- Month - Year, eg: 05 - July - 2019), number (Ex: US 1,200,350), number in words (Ex: one million two hundred thousand three hundred and fifty) and a name (Edward Rusell Federer). I solved the problem only for some documents using OCR on the full page and regular expressions to get these values, but the problem is that some documents have some graphic elements (lines, tables) that cause the OCR sometimes fail and close the robot.
Is it possible to select a specific rectangle / area in the image and use OCR only in this area? I did that, but the problem now is that the rectangle is not always in the same position. This area could be moved in each document (image) depending on the number of words and extra lines included in the document, but there are two tags/labelss (uniques in the document) that define this rectangle/area. My question, is it possible to find both tags and get the coordinates (x1, y1) and (x2, y2) and then select this area using both coordinates and apply OCR, because I know that inside the rectangle are the 4 values that I need?

I think I could use Find OCR Text Position, clipping region, relative scraping but I do not know how to do it.

Any suggestion will help me.

Sorry for my English. English is not my native language.

pattyricarte · July 15, 2019, 3:44pm

@esoccl hope this will help you!

https://go.uipath.com/component/abbyy-flexicapture-connector-for-uipath-31cc20

esoccl · July 15, 2019, 4:16pm

Thanks!!. I need to purchase license for ABBY?

pattyricarte · July 15, 2019, 4:21pm

hi @esoccl

I do suggest if you have multiple format of pdf files please use the abbyy flexi but if it is just simple obtaining of value in pdf just use some activities in uipath and it really works!

shalini.tewary · July 15, 2019, 4:43pm

You can use one of the below options in case you do not want to invest in ABBYY.

You can use Taxonomy manager and define your template and then digitize document and then indicate your elements using validation station. Nut you have to indicate your fields manually everytime.
You can use the new document processing model. Receipt and Invoice AI - Now available in Public Preview! - #44 by mcicca

Topic		Replies	Views
OCR for different files Help	1	1398	May 22, 2018
OCR Data extraction Help	1	804	August 15, 2019
Document template automation StudioX	3	1371	May 3, 2021
Uipath With ABBYY FlexiCapture Help activities	9	3790	December 7, 2018
PDF extraction from unstructured format Studio pdf , activities , question , intelligent_ocr	7	2889	March 10, 2020

Most Active Users - Yesterday
ashokkarale
sharazkm32
Nidhi_Gupta1
Matt67
RobertRussell_Monsalud
Youri98
Slow_Learner
Roman-Routinuum
adi.mehare
Akash_Javalekar1
More details...

OCR Question

Related topics