How extract specific data by using RegEx


I am trying to extract specific data from pdf (which contains 18pages) to excel, I need to extract all data from pdf.
Could any help me with how to extract data by using RegEx or any other solution?

PFA invoice pdf for your reference.

ATLANTIS THE PALM LPO-19955.pdf (1018.9 KB)

@Sri_Harsha Could you please let me know what kind of data you want from the pdf?

i want extract complete data except and due date

@Sri_Harsha As it’s a scanned copy the quality of the image is very low, so it would be difficult to extract all the information via the UiPath default OCR engine or Computer Vision AI Module. The default OCR engine does provide the result but it’s not accurate. You have to use some kind of third-party OCR engine like ABBY to do your job.

@MartianxSpace I have a trial version of Abbyy ocr, I am trying to extract, as below attached flow.
can you help me with any alternative process?

Main.xaml (242.0 KB)

Hello Harsh,
Can you paste your extracted string or file

@Mayyur please find the attached fileLPO Barakat(2).xls (27.5 KB)

Hi Harsh,
I tried using Microsft OCR and Tesseract OCR,Not giving expected scanned result,
Please try with Abby,If all data is captured then we can process it using ReGex,

Thank you

@Mayyur, can you explain to me how to extract item code by using Regex, in my invoice almost 70item codes have

I guess,we need to write regex for each of the fields

@Mayyur, I am tried using Regex, but getting an error message “System.Linq.Enumerable+d__94`1[System.Text.RegularExpressions.Match]” and attached flow

regex.xaml (25.3 KB)

you have to use like this regexOutputVariable(0)

output will be of “Collection” type you can use loop through it to get every value which matches or if you want first you can mention like this (0)