Regarding extracting content from a pdf

Hello,

I am working on PDF automation and found challenges in scraping content from a perticular field called ‘Item desc’. The position of this field varied in each pdf and OCR Scraping dint help. Can someone suggest how i can scrape efficiently.

Thank You,
Anusha

You can search for the text within the image and use anchors to get information from the field next to it

Hello,
Thank you for the quick reply.
Yes, I tried that as well.
But as I said earlier, the position of the parameter keeps changing over many pdf s.So it wasnt suitable.

Thank you

  1. Use find image activity and find the image for ‘Item desc’ in anchor part of anchor base activity
  2. In the Action part of anchor base do the OCR scraping for value of ‘Item desc’

Hello,

I tried that too. Its taking the position of the ‘Item Desc’. Doesnt work.

Thank you

@Anusha_Makam Can you share one dummy pdf file so it will be easy to give solution.

Hello,

invoice.pdf (159.4 KB)

Sometimes the items can be 1 and sometimes many.
Thank you!

@Anusha_Makam Here you go its working using Regular Expression

PdfExtraction.zip (154.3 KB)

1 Like

Hi Indra,

Your method helped me a lot,but my only concern is to scrape the amount,grand total out of the that PDF.Is there any way to get the data out it by using your method??

Plz refer the attached screenshot.

Regads,
Kanthesh