How to extract dynamic data from invoice of same format

I am using form extractor to extract data from invoice which has 3 lines, it works properly for this invoice. But when I try to execute the new invoice of same format which has more than 3 lines, it does not extract them.
How do I make this execute for dynamic lines of same format?
Kindly help me out

Hi, Can I see the visual of what you’re saying?

If you are extracting with dynamic data, An Invoices OOTB ML model will be ideal.

Hi @waseem

As suggested by @sharon.palawandram can you share any screenshots of your invoices?

Regards
Gokul

I have attached two Invoices of same vendor, Invoice 1 is having 1 line item which I have extracted using form extractor


And Invoice 2 is with two line item, Which does not work with the form extractor I used for Invoice1

There might be a scenario when same vendor uploads an invoice with multiple line items, this all should work irrespective of line items since data is in structured format.

1 Like

Hi

Have to try with document understanding

Hey @waseem

I’m assuming these are PDF files, so hopefully

Kindly try the below approach,

  1. Use Read PDF Text with PreserveFormatting property as True

  2. Pass the DataTable generated by above step to the GenerateDataTable activity with all default props

  3. Save the output to a DataTable which should have the final table

If any issues please let us know.

Reference link for activities mentioned

  1. Activities - Read PDF Text

  2. Activities - Generate Data Table From Text

Hope this helps.

Thanks
#nK

@Gokul001,yes I am using document understanding with form extractor.
But the issue is when a vendor uploads invoice with multiple lines as I have attached the invoice above.

Read pdf text will get us text of the particular invoice, after that we need to add regex for extraction.
I have multiple vendors with multiple lines and I am using form extractor, I need to extract data with dynamic line items.

Since you have dynamic lines you can try extracting all fields from a Machine Learning Model.

Here’s a link where you can train an ML model in invoices.

This video was helpful in training multiple line pdf, but we need enterprise version for AI center.
Also I am trying the trial version, but the AI center pipeline is failing some how, earlier it was running successfully.


I have also attached the log file.

Hello @waseem,

Did you try to use data scraping? So you get a datatable of the invoice data?

No I didn’t use data scrapping. I used AI center.

Hello @waseem,

Can you share the sample pdf for both invoices.

BR,
Shahabuddin N

These are the invoices of same vendor from which I need to extract line items i.e. quantity, description ,unit price
Line items are dynamic in some invoice its 2 lines some with 10 lines and so on.

Here it says train-test split is incorrectly set in split.csv. You are running a training pipeline right?
Can you check if you’re training the proper dataset? You can re-export your data to a new file and add it to the train pipeline.