Extract cerrtain data from pdf pod (proof of delivery invoice)

Hi team

I have alot of pdf pod invoice, i need extras the data into excel spread sheet :pray:

Hi @hassannizamie, welcome to the Community.

You can use multiple methods to extract data from pdf. Refer to this list & choose the approach accordingly:

  1. You can read the PDF file using Read PDF Activity & Use RegEx to extract the data from the output.
  2. You can use the Get Text activity & indicate the element from where you need to extract the data.
  3. You can also use the Screen Scrapping method to extract data, as per your requirement.
  4. You can also choose to implement Document Understanding & ML packages to extract data from unstructured files.

Hope this helps,
Best Regards.

Hi @hassannizamie ,

We would require further details of your type of documents to assess which method is the right approach for your case. To understand that, we could first check on the below Questions :

  1. Is your Document a Digital Document or a Scanned document ?
  2. Does the document data contain a Structured format (Will the documents have a similar structure?)
  3. Is it semi-structured ? Meaning we could have different types of Invoice Templates ?

Let us know your feedback on these questions, then we will be able to pinpoint on what method is to be followed/suggested from the list of methods mentioned by @arjunshenoy .

1 Like


@hassannizamie , If the PDF documents are scanned then perhaps you would need to move towards an Intelligent solution using OCR and Document Understanding for extraction of Data. We would recommend for this first to go through the UiPath Academy courses on AI Center and Document Understanding. It would help you get the start of creating a solution using the Document Understanding approach.

Check the below website on UiPath Document Understanding courses :