Easiest Data extraction methods from scanned pdf

maria.josephina · August 15, 2024, 5:52am

Hi Team,

I have requirement to extract some data (invoice number, Shipment number, Total invoice amount, Credit party, Tabular data -Charges containing line items) etc. from a scanned pdf document.
This extracted information should then be compared with data in an Excel file to check for discrepancies.
Could you please recommend the best methods for performing this data extraction using built-in functionalities in UiPath (cost effective)
(Average number of pages to be extracted in a month is 800 pages.)
Is computer vision a good approach in this case?

lrtetala · August 15, 2024, 5:56am

Hi @maria.josephina

Try with

1.Read PDF with OCR activity
2.By using Regex we extract required data and write to excel

Or

1.Document Understanding

Regards,

ashokkarale · August 15, 2024, 8:57am

@maria.josephina,

If your document going to be in same format, it’s good to use Extract PDF text using OCR.

Try all OCR engines available and select best out of those.

Thanks,
Ashok

singh_sumit · August 15, 2024, 12:04pm

Hie @maria.josephina if you have multiple data with same structure go with Document Understanding method its fast … and more reliable…

cheers Happy Automation

maria.josephina · August 18, 2024, 8:15am

@singh_sumit My requirement is only on one document type and looking for some cost effective techniques.
Document understanding involves cost right?

singh_sumit · August 20, 2024, 9:30am

@maria.josephina Yes you could say that . so try with read pdf ocr and use some regex manipulation technique or string manipulation.
cheers happy automation…

bharat.c · August 21, 2024, 7:44am

If you have fixed document type, then go with reading pdf data with OCR and applying regex on it to extract necessary data.

Thanks,
Bharat

Topic		Replies	Views
Best Approach for Invoice Processing Help pdf , ocr , activities , data_scraping , question	6	1338	November 25, 2019
How to rad invoice number from scanned PDF Help studio	10	2212	November 7, 2019
Invoice Data Extraction .PDF Activities uiautomation , activities , studio , question	6	1479	December 2, 2022
Need help data extraction in PDF Invoice Help activities	0	869	August 28, 2019
What uipath packages are used to extract data from photographed or scanned invoices? Activities ocr , activities , abbyy , question , document_understanding , intelligent_ocr , omnipage , tesseract-ocr , ocr-engine , abbyy-flexicapture , google-ocr	3	826	May 6, 2022

Most Active Users - Yesterday
Anil_G
ashokkarale
Gautham_Pattabiraman
goprisko
rashmisingh2695
SorenB
Julian_Muhlbauer
More details...

Easiest Data extraction methods from scanned pdf

Related topics