Unable to extract specific data from scanned pdf

Sri_Harsha · January 23, 2020, 10:35am

Hello,
I am new to UiPath and I am unable to extract specific data from scanned pdf invoice. when trying to get purchase order number by using “Get text ocr” it getting different value, cloud any help.

Pradeep_Shiv · January 23, 2020, 10:39am

which Ocr Engine you are using to extract
after extracting you have to use Regex to extract Specific Data

vilardelld · January 23, 2020, 10:42am

Hello,

Is it an digitally native PDF or a scanned one?

In the case of a digitally native, you can use the “GET TEXT FROM PDF” activity and use regex or split in order to specific data.

Else, if it is an SCANNED PDF you should use a really good OCR (e.g. UiPath Computer Vision OCR activities, Abbyy, Google, etc) to extrain the most accurate text and then use REGEX or SPLIT. In the case you use Abbyy or the UiPath Invoice Extraction activities you will be able to train the algorithm to extract specific data.

Thank you.

Kind regards,
Daniel

Sri_Harsha · January 23, 2020, 10:47am

I am using both tesseract and Microsoft OCR and how to use regex activity if the invoice contains “item Id” which differs from one to another. Ex Item id 1 contains 15 digit and item id 2 : 16 digit

Pradeep_Shiv · January 23, 2020, 10:49am

can we see the sample of data which we are trying to extract??

Sri_Harsha · January 23, 2020, 11:55am

PFA below data, which I am trying to extract

Sri_Harsha · January 24, 2020, 6:45am

Hi Pradeep,

I have attached a screenshot of data which we are trying to extract, sorry unable to attach the pdf because of being new to uipath forum

Topic		Replies	Views
OCR Specific Field data Help ocr , activities , question	5	1013	November 10, 2019
Scan Pdf Document Extraction Academy Feedback	3	1212	August 25, 2020
How to read the specific data in pdf Activities pdf , activities , question	33	4898	June 2, 2021
Pdf data extraction for specific element Help pdf , activities , question	6	1753	April 17, 2021
Pdf Extract from OCR Text Task Capture	4	1661	August 15, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
sudster
Yoichi
CHEN-CC
v.886fgla886
SorenB
sven.wullum1
arodriguez1
mateuszmacheta
More details...

Unable to extract specific data from scanned pdf

Related topics