Reading data from pdf different format

Hi everyone,

I have around 30 different pdf(invoice) with each one of its own different format. and i need to extract data from it nearly everyday and each day a different pdf with different data from vendor and i compare with respective PO. i want to automate the same but i am not able to understand that how to extract data like quantity name quantity privce with different format each time i cannot even run this in loop.
Any idea highly appreciated.

Can you attach sample pdf files

Hi,

Sure i will attach you one or two pdf dummy files but everyday this will change and sometimes this pdf can be 40 also.pdf1.pdf (146.0 KB)
pdf2.pdf (249.5 KB)
pdf3.pdf (338.7 KB)

I have been looking for a solution for the same problem. Have you got any solution for that?

Hello Aamir can u tell whether your using community version or paid version.
If u are using community version we cann’t read multiple pdf’s. If u r using paid version we have flexicapture to read multiple pdf formats.

Hi Yogi,

I am using community version. Then in community version how many pdf can i read at a time in loop?

Only one pdf format can be read by one uipath program. If you want to read multiple pdf’s using single uipath program , you have to use paid version. In paid version a 3 party AI software Abbay flexicapture is available for reading multiple pdf’s.

Ok thanks a lot

You do not need to own the paid version.
Solution 1:
There is a work around, however it’s UI intense.
Open each pdf using OCR, with Abby as the OCR engine.
You could use anchor activity or scrape the data and find text.
Disadvantage - will open up each pdf in the front-end, where as solution 2 will run in the background, but is slightly more complex unless you’re familiar with RegEx.

Solution 2
You’d need to loop through all pdf’s in a folder, open each one and search for strings which you require by identifying different patterns with RegEx…This can be done with the ‘matches activity’ or the ‘is match’ activity.

1 Like