How extract specific data by using RegEx

Sri_Harsha · January 30, 2020, 8:39am

Hi,

I am trying to extract specific data from pdf (which contains 18pages) to excel, I need to extract all data from pdf.
Could any help me with how to extract data by using RegEx or any other solution?

PFA invoice pdf for your reference.

ATLANTIS THE PALM LPO-19955.pdf (1018.9 KB)

MartianxSpace · January 30, 2020, 8:43am

@Sri_Harsha Could you please let me know what kind of data you want from the pdf?

Sri_Harsha · January 30, 2020, 9:09am

i want extract complete data except s.no and due date

MartianxSpace · January 30, 2020, 9:30am

@Sri_Harsha As it’s a scanned copy the quality of the image is very low, so it would be difficult to extract all the information via the UiPath default OCR engine or Computer Vision AI Module. The default OCR engine does provide the result but it’s not accurate. You have to use some kind of third-party OCR engine like ABBY to do your job.

Sri_Harsha · January 30, 2020, 9:41am

@MartianxSpace I have a trial version of Abbyy ocr, I am trying to extract, as below attached flow.
can you help me with any alternative process?

Main.xaml (242.0 KB)

Mayyur · January 30, 2020, 9:43am

Hello Harsh,
Can you paste your extracted string or file

Sri_Harsha · January 30, 2020, 9:48am

@Mayyur please find the attached fileLPO Barakat(2).xls (27.5 KB)

Mayyur · January 30, 2020, 9:51am

Hi Harsh,
I tried using Microsft OCR and Tesseract OCR,Not giving expected scanned result,
Please try with Abby,If all data is captured then we can process it using ReGex,

Thank you

Sri_Harsha · January 30, 2020, 9:58am

@Mayyur, can you explain to me how to extract item code by using Regex, in my invoice almost 70item codes have

Mayyur · January 30, 2020, 10:02am

I guess,we need to write regex for each of the fields

Sri_Harsha · January 30, 2020, 11:19am

@Mayyur, I am tried using Regex, but getting an error message “System.Linq.Enumerable+d__94`1[System.Text.RegularExpressions.Match]” and attached flow

regex.xaml (25.3 KB)

Pradeep_Shiv · January 30, 2020, 11:23am

you have to use like this regexOutputVariable(0)

Pradeep_Shiv · January 30, 2020, 11:25am

output will be of “Collection” type you can use loop through it to get every value which matches or if you want first you can mention like this (0)

Topic		Replies	Views
How Extract Particulart data from multiple pdf which have same format Automation Starter uiautomation , pdf , activities , studio	10	1340	September 18, 2022
Unable to extract specific data from scanned pdf Help pdf , activities , question	6	1086	January 24, 2020
Extract scanned PDF to excel Studio	5	4969	August 16, 2020
I need to extract all the details from invoices pdf and line item describtion quantity and all the fields and i need to do this for all pdf files in the folder Studio studio , question , activities_panel	23	2979	June 30, 2021
Read specific data from scanned PDF using Regex Studio activities , regex , question , intelligent_ocr	1	948	March 12, 2020

Most Active Users - Yesterday
ashokkarale
mkankatala
Parvathy
vrdabberu
sandyarpa767
pravallikapaluri
gantamohan502
indiedev91
naveen.s
Anil_G
More details...

How extract specific data by using RegEx

Related Topics