How to read multiple invoices in pdf

Good night, darlings! I need to extract information from invoices and save them in a specific environment, but I am confused about how to do this. I can extract the data I want from a note using Get OCR Text, but how do I make it happen with multiple invoices without having to go one by one? can you help me? Thank you!

4 Likes

Directory.GetFiles(inputfilepath)

Use Read Pdf text with OCR

2 Likes

Hi @KarinaFreitas
what you mean do you have multiple invoices of pdf or do you have 1 pdf with multiple pages.

cheers :smiley:

Happy learning :smiley:

4 Likes

Hey @KarinaFreitas

To extract the values from the invoice you need to read each file… to do so, you first need to read the file paths using the command given by @Shubham_Akole

This will give you an array or string that contain file paths…

Next, loop through each file using a for each activity. Inside the loop, you can place the activities that you have already done to read one file…

2 Likes

Fine the sequence will be like this
—use a assign activity
arr_filepath = Directory.GetFiles(“yourfolderpath”,”*.pdf”)

Where arr_filepath is a variable of type array of string

—then use a FOR EACH a activity and pass the above variable and change the type argument property as string
—Inside the loop use READ PDF with OCR where pass the filepath as item.ToString this will open the file and then you can repeat the same to read the data

cheers @KarinaFreitas

2 Likes

i read data from multiple Pdf and How Store this data in Multiple excel file

1 Like

Fine
In the very beginning create a datatable with build datatable activity and get the output with a variable named dt

—then once after reading the data from excel,
Inside the loop itself use a ADD DATAROW activity and pass the variables you have with pdf values to the property ArrayRow like this
{variable1,variable2,…,variablename}
And in the datatable mention as dt
Then inside the same loop use write range activity for workbook activities and create new excel for each set of data obtained from each pdf
—atlast in the for each loop next to this write range activity use a clear datatable activity with input as dt as only then you will be having the fresh data for next fresh excel

Cheers @Shubham_Akole @KarinaFreitas

I will share my team’s experience which I have already shared in some other thread too. Our team has been researching over invoice data extraction tools that need no templates or Regex, for huge invoice volumes. After researching Kofax, Abbyy we settled down to testing the performance of two AI based tools: 1. UiPath and 2. KlearStack . We have been satisfied with UiPath for Basic fields (3 - 4 fields) but found that it doesn’t support bulk uploads(zip files). While KlearStack was found a little slower but provides multiple uploads via .zip files and extracts almost all detailed fields like table line-items, tax details, addresses focusing on high accuracy.

P.S.: This was just my team’s review for the assistance of the UiPath Community and is not influenced by any paid recommendations or affiliations.

1 Like

you can sort of bulk upload as detailed above.

Dump the files in a folder, run a digitisation robot to extract all the data required and upload to a queue, then move the files to a processed folder once extraction has occured.

Use an attended bot on the users desktop to run the validation station.

Plus ytou won’t find many that will be able to read multiple invoices within a single pdf without adding blank / black pages.

Ive just come out of a project were we were digitizing 10k invoices a month, but the client refused to use uipath and wanted to use another solution.

in this case, it does not have several pages, but I need to read more than one pdf invoice. I need to do this for a company that receives several invoices from different municipalities per day, and with that, it needs to obtain the information from these invoices and insert them in an internal system. But I am having difficulty exactly reading all these invoices

OK thank you! I followed this path, the robot executes and ends in a matter of seconds, without error.
But what do I do after that to get a return? What is the correct activity to obtain, for example, the corporate name it contains on the invoice?

1 Like

Did we try this step buddy
Using WRITE RANGE activity which will write back to a excel file and that can be taken as a output report

Cheers @KarinaFreitas

1 Like

ok, but just for me to understand … where specifically on this path you went, will the pdf data be taken? Do I need to pass the path to the folder where are the PDFs that need to be read?
Another question, when you say: “use write range activity for workbook activities and create new excel for each set of data obtained from each pdf” in the write range activity do I need to pass the path of my folder or use the variable dt?
thank you!!! Sorry to be lost

could you send me an example of how i should do it? I think it would be easier for my understanding

Yes we need that to be mention along here

Where yourfolderpath is the one to be replaced with your folder path where pdf files are existing

And

Both buddy
Pass the filepath of the excel file you have and the input datatable which is to be written to that excel

Cheers @KarinaFreitas

Thank you! that way I can get the information of all the notes that are in the directory, but now I need to get only the CNPJ and the Note Number of all … what should I use? CV helps in this case? or else a Get OCR Text?
I’m doing it that way

Hi @pattyricarte,

I have 1 pdf file with multiple invoices and 1 invoice can also have multiple pages because it has multiple lines. Can the uipath form extractor handle this invoice extraction? I tried selecting field combinations but I only get the invoice detail of the first page.

Thank you very much.