How to read multiple invoices from 1 pdf file?

toffi.poffi · May 14, 2021, 10:25am

Hi,
I have 1 pdf file with multiple invoices. Is it possible to extract information from all invoices using ML?

wasea · May 15, 2021, 9:23pm

This is an interesting scenario.

First, if you have a lot of invoices, maybe you can use an activity to Split the pdf , to create a pdf file per invoice.

Than, you can use For each (file) activity to use the files as transaction. (ReF template)
Than, you might be able to use Document understanding, or Regex or String operations to get the data you wanted.

Didn’t tested yet the ML, might be possible.

I hope it helps.

Vasile.

copy_writes · May 16, 2021, 3:37am

It was simple You just put all pdf in on file Then Use
Use assign activity
Step 2: Create a variable that was in Array String then use

Directory.GetFiles("C:\Users\*\*\UiPath\Form Question 2\Chethan","*pdf")

Use For Each and that should be in the string

use files variable into the for each

now you can read the multiple pdf files

Note: when you give the path address you have to give the full path of the folder from C:\user…

If You get the solution mark it as solved

Dikta · June 1, 2021, 11:50am

Hi, me too, need to read all information from 5 invoices in 1 pdf (5 pages), is it possible?

tudor.carean · June 3, 2021, 1:38pm

You can use Intelligent Keyword Classifier to split the document. Then you can simply iterate from it’s results and extract from it. After splitting you can also display Classification Station to let the user confirm if the split is ok or not. A flow would look like this:

postwick · June 3, 2021, 1:40pm

He doesn’t have multiple PDF files. He has one PDF file that contains multiple invoices, and he wants to individually process the multiple invoices that are in the one PDF.

tudor.carean · June 3, 2021, 1:45pm

Intelligent Keyword Classifier can split a single PDF. It will return an array of ClassificationResult, through which you can iterate and use to extract from a particular item/page range in that document. Remember that the Data Extraction Scope also accepts a “ClassificationResult” as input and will run the extraction on that specific region only. See the sample flow I posted above.

copy_writes · June 4, 2021, 1:16pm

Use Document understanding bro you can read all the 5 pages

john_smith2 · October 11, 2023, 4:08pm

what will happen if 1 invoice in that pdf is having 2 pages. how to handle that

Ioana_Gligan · October 12, 2023, 2:09pm

Use IntelligentKeywordClassifier to try to break down and classify each invoice in your file;
Use Classification Station actions (or standalone at least for testing) to verify the classification and splitting and correct if necessary;
Use a For Each classification result from Classification Station and perform Data Extraction;
Create tasks for each extracted data;
use a parallel for each to wait for task

(last two if you are using Action Center - if you want to use attended Validation Station go ahead directly in the For Each above).

john_smith2 · October 12, 2023, 7:36pm

in intelligent keyword classifier should I train it with the pdf with multiple invoice or the individual invoices. I has few pdf with 1 invoice and i trained with those but when I add more its not detecting the pages correctly. sometimes the second page come as the first page for another invoice which has 1 page only.

Ioana_Gligan · October 13, 2023, 6:46am

You should try training with individual invoices (so if you have multiple invoices in a single file break them down and add them as individual files). You should also make sure you have a variety of invoices - not only one-pagers - and see if it improves.

Classification and splitting WILL sometimes fail (depending on the complexity of the use case). For such cases specifically we recommend that you use Classification Validation Station (or task in Actions) - if there is no other way to check that the break is correct.

For example, if a file has one page it’s obvious it’s not going to get a bad split (it still might get a bad classification tho). Such simple rules you could build in your workflow to minimize to the best extent possible how many files do get to Classification Station…

Topic		Replies	Views
Multiple Invoice in single pdf Document Understanding	8	1812	October 11, 2023
How to classify and extract multiple invoices on same page of document? Document Understanding	4	1818	January 24, 2023
Document Understanding for multiple PDF in one multiple pdf Studio document_understanding	2	964	March 7, 2023
Multiple Invoices per page Document Understanding	4	983	October 25, 2022
PDF Document Processing Studio pdf , studio , question , tools , pdf-extraction	1	46	July 27, 2024

How to read multiple invoices from 1 pdf file?

Related topics