Help with extraction of multiple data from pdf invoices


Is there a way for me to be able to extract multiple descriptions from an invoice and be able place it in separate rows in excel? For example, one invoice may only have one description whereas another has three descriptions. I am having difficulty choosing a suitable activity to cater to these conditions using one workflow.

Please refer to the attached PDF invoices & the Excel output I hope to achieve!

Thank you!!

INV0002.pdf (167.8 KB) INV0003.pdf (185.0 KB)

You can add each discription to a List(Of String and concatenate them into one string with MyList.Join("|"c), where MyList is the list you’ve created. This example concatenates the descriptions with the “|” symbol, but you could use others, such as Environment.Newline ",".

Hi @Steven_Kho,

I’ll suggest you to follow these steps,

  1. first of you can use Read pdf text OR Read pdf with OCR activity to get all text in string variable.
  2. then, you can use multiple assign for fetching multiple values by using Regex. as shown below,
    InvNum = (System.Text.RegularExpressions.Regex.Match(text,"(?<=iPhone 11)(\s*Rs. ).+?(\s\d*)").Value).Trim
  3. here, InvNum String var and (?<=iPhone 11)(\s*Rs. ).+?(\s\d*) is an example of regex pattern.

Example shot:

In above screenshot you can see that, there’s a match for iPhone 11 price (highlighted in blue)
So, like this you can use multiple assigns with regex patterns accordingly.
this is just an example… here you can know more & build regex pattern —>


Thank you!