PDF Handling for various pdf format (e.g. invoices)

Hello everybody,

I’m currently facing a very interesting Process implementation.

I would like to automate our Supplier Confirmations. For the beginning we like to concentrate on our 10 high intensive suppliers of which we receive 50+ confirmations per day.

For each confirmation the Robot has to extract several Information from the confirmation which is in the PDF Format. For each supplier the PDF Format is the same. But they vary from supplier to supplier.

Do you have some experience or best practise how to extract data from PDFs and dynamize the implementation regarding the different PDF Formats.

Best regards

Marcel

Hi,
You can try to convert the pdf to text using Read pdf text activity and then write regulart expression to match the corresponding fields and extract them using matches activity where you need to pass the output of Read pdf text activity as input string in matches activity and then the regular expression for which you need to extract the field from pdf.

If the pdf is of scanned type you can try with CV activities (CV Scope-> get text)

Let us know if this helps.
Regards,
Pavan H

Hi,

thanks for you answer. I pretty sure your solution would work,too.
I figured out that you can use Read PDF with OCR and convert the Output to an Array. Basically you receive a Long string splitted into different lines. For my use case it works and I can just Count the lines which means that the line numer is the index of my Array where my needed Information from my PDF is stored.

Regards

1 Like