Invoice pdf extraction

I have different invoices from multiple vendors. I have to extract certain fields write down into excel. What is the best and easiest way to do it. All the pdf’s are system generated.


If PDF file contains image then use Read PDF with OCR activity else use Read PDF activity to read data and store it in a string variable. And then apply Regular expressions or String manipulation functions to read required data and write into excel file.

1 Like

Hi @hgupta99

i guess if pdf’s are system generated then you should able to select the text.

Create a config excel file with the multiple vendors and field name to refer in the code. this way you can easily extend it for other vendors without changing the code.

With the help of above data create a rule for which vendor what kind of operation you want to do.

Use Read Pdf Text activity, then you will get a single string. try to perform string operation like substring, left, right to get the desired output or you can use Regular Expression also if the pattern is similar for all vendors.

one question i have
===> can you able to get a selector for the field that you are trying to extract?

if yes use Anchor base activity to find relative label and use get text to get desired data.

Hope this will help you !!!

Vijay Kumar C.