I am using form extractor to extract data from invoice which has 3 lines, it works properly for this invoice. But when I try to execute the new invoice of same format which has more than 3 lines, it does not extract them.
How do I make this execute for dynamic lines of same format?
Kindly help me out
Hi, Can I see the visual of what you’re saying?
If you are extracting with dynamic data, An Invoices OOTB ML model will be ideal.
Hi @waseem
As suggested by @sharon.palawandram can you share any screenshots of your invoices?
Regards
Gokul
I have attached two Invoices of same vendor, Invoice 1 is having 1 line item which I have extracted using form extractor
And Invoice 2 is with two line item, Which does not work with the form extractor I used for Invoice1
There might be a scenario when same vendor uploads an invoice with multiple line items, this all should work irrespective of line items since data is in structured format.
Hi
Have to try with document understanding
Hey @waseem
I’m assuming these are PDF files, so hopefully
Kindly try the below approach,
-
Use
Read PDF Text
withPreserveFormatting
property asTrue
-
Pass the
DataTable
generated by above step to theGenerateDataTable
activity with all default props -
Save the output to a
DataTable
which should have the final table
If any issues please let us know.
Reference link for activities mentioned
Hope this helps.
Thanks
#nK
@Gokul001,yes I am using document understanding with form extractor.
But the issue is when a vendor uploads invoice with multiple lines as I have attached the invoice above.
Read pdf text will get us text of the particular invoice, after that we need to add regex for extraction.
I have multiple vendors with multiple lines and I am using form extractor, I need to extract data with dynamic line items.
Since you have dynamic lines you can try extracting all fields from a Machine Learning Model.
Here’s a link where you can train an ML model in invoices.
This video was helpful in training multiple line pdf, but we need enterprise version for AI center.
Also I am trying the trial version, but the AI center pipeline is failing some how, earlier it was running successfully.
I have also attached the log file.
No I didn’t use data scrapping. I used AI center.
These are the invoices of same vendor from which I need to extract line items i.e. quantity, description ,unit price
Line items are dynamic in some invoice its 2 lines some with 10 lines and so on.
Here it says train-test split is incorrectly set in split.csv. You are running a training pipeline right?
Can you check if you’re training the proper dataset? You can re-export your data to a new file and add it to the train pipeline.