Hi,
I have 1 pdf file with multiple invoices. Is it possible to extract information from all invoices using ML?
Hello @toffi.poffi ,
This is an interesting scenario.
First, if you have a lot of invoices, maybe you can use an activity to Split the pdf , to create a pdf file per invoice.
Than, you can use For each (file) activity to use the files as transaction. (ReF template)
Than, you might be able to use Document understanding, or Regex or String operations to get the data you wanted.
Didn’t tested yet the ML, might be possible.
I hope it helps.
Vasile.
It was simple You just put all pdf in on file Then Use
Use assign activity
Step 2: Create a variable that was in Array String then use
Directory.GetFiles("C:\Users\*\*\UiPath\Form Question 2\Chethan","*pdf")
Use For Each and that should be in the string
use files variable into the for each
now you can read the multiple pdf files
Note: when you give the path address you have to give the full path of the folder from C:\user…
If You get the solution mark it as solved
Hi, me too, need to read all information from 5 invoices in 1 pdf (5 pages), is it possible?
You can use Intelligent Keyword Classifier to split the document. Then you can simply iterate from it’s results and extract from it. After splitting you can also display Classification Station to let the user confirm if the split is ok or not. A flow would look like this:
He doesn’t have multiple PDF files. He has one PDF file that contains multiple invoices, and he wants to individually process the multiple invoices that are in the one PDF.
Intelligent Keyword Classifier can split a single PDF. It will return an array of ClassificationResult, through which you can iterate and use to extract from a particular item/page range in that document. Remember that the Data Extraction Scope also accepts a “ClassificationResult” as input and will run the extraction on that specific region only. See the sample flow I posted above.
Use Document understanding bro you can read all the 5 pages
what will happen if 1 invoice in that pdf is having 2 pages. how to handle that
Use IntelligentKeywordClassifier to try to break down and classify each invoice in your file;
Use Classification Station actions (or standalone at least for testing) to verify the classification and splitting and correct if necessary;
Use a For Each classification result from Classification Station and perform Data Extraction;
Create tasks for each extracted data;
use a parallel for each to wait for task
(last two if you are using Action Center - if you want to use attended Validation Station go ahead directly in the For Each above).
in intelligent keyword classifier should I train it with the pdf with multiple invoice or the individual invoices. I has few pdf with 1 invoice and i trained with those but when I add more its not detecting the pages correctly. sometimes the second page come as the first page for another invoice which has 1 page only.
You should try training with individual invoices (so if you have multiple invoices in a single file break them down and add them as individual files). You should also make sure you have a variety of invoices - not only one-pagers - and see if it improves.
Classification and splitting WILL sometimes fail (depending on the complexity of the use case). For such cases specifically we recommend that you use Classification Validation Station (or task in Actions) - if there is no other way to check that the break is correct.
For example, if a file has one page it’s obvious it’s not going to get a bad split (it still might get a bad classification tho). Such simple rules you could build in your workflow to minimize to the best extent possible how many files do get to Classification Station…