Dynamic PDF data extraction

Hello All,
I am trying to retrieve some data from scanned pdf (scanned documents) and i have to retrieve the address on the documents ,Now there is no indication like (To, from) and the template of pdf keep on changing.I have tried read Pdf Ocr , screen scraping ,anchor tag etc to retrieve the info it works sometimes and sometimes it fails to initiate . It usually fails if i close all my pdf’s and then start the bot.
Help will be appreciated .

If the template changes for every one then it is tough bro

Hi @Rahul_Dochak,
Welcome to this nice community!

So, based on your message, I can tell you about 2 scenarios.

  1. You should have all the pdf files on a folder.
    You can create a for each to retrieve the files from that folder.
    If you do pdf scraping, at the beginning of the workflow put an activity to open each file and after that do the scrapping.

  2. Another option is to use Read PDF text/OCR
    Use Substring or regEx to get the text that you need from there.

Well, if there are multiple templates, unfortunately you need to create a sort of workflow for each template.

If are invoices, maybe you it will help you the new features:

Vasile.

2 Likes

Thanks ,I will try it and let you know if it works.