Extract characters from PDF with various pages

I am new in UiPath. Recently after some video study and communication learning I start to use UiPath to do data extract. This is pdf file with several Invoice number with Invoice amount in different page. Now I face several problems as listed:

  1. After using “Get PDF Page Count” I try to use for each loop to extract every matched “Invoice Number” and “Invoice Amount”. However the result keep providing me with the 1st “Invoice Number” and 1st “Invoice Amount”. How can I get the rest data?
  2. Continue with for each loop, the result keep providing me many times with same “Invoice Number” and “Invoice Amount”. What should I do at this moment? To Split the PDF and read one by one and then later combine together or there is one way that I could read every “Invoice Number” and “Invoice Amount” from the command?

Hi @Raymond6 ,
Can you share your file and what is value you want to get ?
regards,

Hi

Can you try to use Range property of ReadPdfText activity? The following sample get text for each page.

Regards,

Hello @Raymond6

For Each (page in Enumerable.Range(1, pdfPageCount))

Assign
pageText = Read PDF Text (Page: page)

Use regular expressions or string manipulation to extract Invoice Number and Amount from pageText

Add the extracted data to a collection (e.g., DataTable or List)

After the loop, you can process or combine the collected data as needed.

Thanks & Cheers!!!

Hello Nguyen:
Sorry I may not provide to you with the file since it contains sensitive information but I can state more detail accordingly.
What this invoice be look like?

  1. This is a PDF invoice that contains invoice no, Material Numbers and Invoice Amount listed. When a new Invoice no occurs, it would move to a new PDF page with new Material numbers and new Invoice Amount.
  2. PDF information format is the same. The only difference is the number of material numbers. With more material numbers the pages will lead to 2 or 3 pages with same Invoice no and finally get only one Invoice Amount for one Invoice no.
    Hope this will give you more insight in the PDF sheet.

I tried but it pops up this error.
image

HI,

Can you share your workflow (xaml file or screenshot)?

Regards,

Hi,

Can you try RepaetNunmberOfTimes activity instead of ForEach as the above image?

Regards,

Hello Yoichi:
Yes, it works. Thank you very much for this part.
But for the data extraction from the pdf. What would be your suggestion upon using repeat activity?
E.g: For Invoice Amount this would be one line data that I will only look for characters between “USD” and “SGD”. I should use assign or regex to get the result? Since the final goal is to read every Invoice No and Invoice Amount from one pdf file. Then after read this page many times I do test for the assign, the result is not so good so I raised this following question.

Best,

In general, it’s better to use regex as the following, I think.

image

 strResult = System.Text.RegularExpressions.Regex.Match("","(?<=USD).*(?=SGD)").Value

Regards,

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.