Seek Help for Regex to Extract from PDF

Hi all, I would like to extract the Invoice number (SA-3077857) and the amount after “DDU MALAYSIA” (which is 2,686.80) from the attached pdf invoice and the text file from Read PDF Text activity.
I would appreciate help with the REGEX to extract the above. I am confused that the matches are different using the following at Regex101.com and Regex Builder (Uipath) using

  1. ^NO.: *(?[^\s]+)
  2. ^DDU MALAYSIA +(?[^\s]+)

These 2 expressions are not from me but help I got from #Regex as I am new to regex.
Any help will be much appreciated.

SM-3077857-1.pdf (69.6 KB)
Invoices.txt (1.1 KB)

hope thsi would solve your issue
image

Temp.zip (16.1 KB)

1 Like

Hi Ashley,

Thank you! You have helped me get the first part - Invoice Number exactly what I needed.

  1. ^NO.: *(?[^\s]+)
  2. ^DDU MALAYSIA +(?[^\s]+)

I would appreciate any help on the second part
2. ^DDU MALAYSIA +(?[^\s]+)

to extract “2,686.80”.

@COOLBOT,
Use the below regex to get the amount value:

[0-9]+\.?[0-9,]*

Regards
Senthil V.

^NO\.:\s+(?<invoice>.+?)\s*$
^DDU MALAYSIA\s+(?<amount>[\d,\.]+)

image
image

1 Like

Hello,
If the PDF format is fixed,then you can also go for string split.

Hi msan, the regex

gave full matches when tested at regex101.com. However, when I implemented these regex in UiPath, no matches found.

Ashley11 has mentioned that Regex101 allows creation, debugging and testing forPHP, PCRE, Python, Golang and JavaScript. How different are these regex when used in UiPath (or Regex Builder)?

@COOLBOT

I use PCRE flavor on regex101 when testing patterns for UiPath. You have to pay attention to the options however as the default are not necesarily the same. I’ll use Multiline below.

image

Please consider the following example assignments (all strings, except amount as Decimal) and with MyText as the invoice content.

InvoicePattern = "^NO\.:\s+(?<invoice>.+?)\s*$"
AmountPattern = "^DDU MALAYSIA\s+(?<amount>[\d,\.]+)"

Invoice = System.Text.RegularExpressions.Regex.Match(MyText, InvoicePattern, RegexOptions.Multiline).Groups("invoice").ToString

Amount = Convert.ToDecimal(System.Text.RegularExpressions.Regex.Match(MyText, AmountPattern, RegexOptions.Multiline).Groups("amount"))

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.