Extract data from different invoices

Hello,

I have different vender invoices.

  1. I extract data like Invoice No., PO No., Invoice Date, Invoice Amount etc. and update into the application.

  2. Also in the application there are two line item and updating base on given PDFs:

image

I have attached dummy invoices.

Artist-invoice-template.pdf (420.7 KB)
Lawn Care Invoice Template - Jotform PDF Editor.pdf (110.5 KB)
wordpress-pdf-invoice-plugin-sample.pdf (23.3 KB)

  • In 1st PDF they are consider Line Total and Freight value .
  • In 2nd PDF they are consider Subtotal and shipping charges value.
  • In 3rd PDF they are consider Total items and sales tax value.

How can I extract data from different vender invoices.

Thanks
Minal Patil

Hi @minal.patil,

Approach 1 : Use read PDF text activity convert all data to string and then use regex operations to extract the data which you want

Approach 2 : Use the document understanding framework (more reliable and consistent)

Let us know your findings and difficulties.

@minal.patil

You can use document understanding and train these models…label them Appropriately…

Apart from that

You can use regex as well with multiple searches that you need

Something like this (?=Line Total|Sub Total)\.*

So it would extract the value after line total or sub total…you can include more as well

Cheers