Extract characters from PDF with various pages

Raymond6 · October 26, 2023, 3:15am

I am new in UiPath. Recently after some video study and communication learning I start to use UiPath to do data extract. This is pdf file with several Invoice number with Invoice amount in different page. Now I face several problems as listed:

After using “Get PDF Page Count” I try to use for each loop to extract every matched “Invoice Number” and “Invoice Amount”. However the result keep providing me with the 1st “Invoice Number” and 1st “Invoice Amount”. How can I get the rest data?
Continue with for each loop, the result keep providing me many times with same “Invoice Number” and “Invoice Amount”. What should I do at this moment? To Split the PDF and read one by one and then later combine together or there is one way that I could read every “Invoice Number” and “Invoice Amount” from the command?

Nguyen_Van_Luong1 · October 26, 2023, 3:31am

Hi @Raymond6 ,
Can you share your file and what is value you want to get ?
regards,

Yoichi · October 26, 2023, 3:38am

Hi

Can you try to use Range property of ReadPdfText activity? The following sample get text for each page.

Regards,

Kartheek_Battu · October 26, 2023, 5:09am

Hello @Raymond6

For Each (page in Enumerable.Range(1, pdfPageCount))

Assign
pageText = Read PDF Text (Page: page)

Use regular expressions or string manipulation to extract Invoice Number and Amount from pageText

Add the extracted data to a collection (e.g., DataTable or List)

After the loop, you can process or combine the collected data as needed.

Thanks & Cheers!!!

Raymond6 · October 26, 2023, 5:36am

Hello Nguyen:
Sorry I may not provide to you with the file since it contains sensitive information but I can state more detail accordingly.
What this invoice be look like?

This is a PDF invoice that contains invoice no, Material Numbers and Invoice Amount listed. When a new Invoice no occurs, it would move to a new PDF page with new Material numbers and new Invoice Amount.
PDF information format is the same. The only difference is the number of material numbers. With more material numbers the pages will lead to 2 or 3 pages with same Invoice no and finally get only one Invoice Amount for one Invoice no.
Hope this will give you more insight in the PDF sheet.

Raymond6 · October 26, 2023, 6:05am

I tried but it pops up this error.

Yoichi · October 26, 2023, 6:07am

HI,

Can you share your workflow (xaml file or screenshot)?

Regards,

Raymond6 · October 26, 2023, 6:18am

Yoichi · October 26, 2023, 6:20am

Hi,

Can you try RepaetNunmberOfTimes activity instead of ForEach as the above image?

Regards,

Raymond6 · October 26, 2023, 8:37am

Hello Yoichi:
Yes, it works. Thank you very much for this part.
But for the data extraction from the pdf. What would be your suggestion upon using repeat activity?
E.g: For Invoice Amount this would be one line data that I will only look for characters between “USD” and “SGD”. I should use assign or regex to get the result? Since the final goal is to read every Invoice No and Invoice Amount from one pdf file. Then after read this page many times I do test for the assign, the result is not so good so I raised this following question.

Best,

Yoichi · October 26, 2023, 8:49am

In general, it’s better to use regex as the following, I think.

 strResult = System.Text.RegularExpressions.Regex.Match("","(?<=USD).*(?=SGD)").Value

Regards,

system · October 30, 2023, 5:56am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merge pdf page after extraction of data from a large file in Uipath Studio studio , question , activities_panel	5	1671	October 27, 2021
Extracting text from multiple PDF files StudioX studio , question , variables_management	5	1387	December 8, 2023
Extract Data from one PDF file containing Multiple pages of Invoices Studio excel , database , pdf , activities , studio , question , ml , ai_center , tools	2	3207	April 11, 2022
Read multiple pages of a PDF file Document Understanding	10	4515	April 26, 2022
Extract invoice number from pdf Activities pdf , activities , question	9	1970	November 22, 2022

Extract characters from PDF with various pages

Related topics