Get text using Regex

Hello everyone,

I have a pdf that I extracted the data from using read PDF text activity. I have a line that I want to extract certain data from. For example:

I want the values highlighted from the image.
Below you can find the string extracted from the pdf activity

GRAND TOTAL (I + II + III);470,957,563;57,606,092;413,351,471;409,692,987;GRAND TOTAL (I + II + III + IV + V);413,351,471;409,692,987

I want to retrieve the numbers as different strings. How can this be done?

Appreciate your help

Hey @mounir.mohsen

The above can be achieved using String.Split function. But I’m assuming the structure of the data will be the same as you sent always.

str_PdfTextLine.Split(";"c)

The statement results in an array with all the elements from which the number values can be retrieved.

Hope this helps.

Thanks
#nK

Sequence.xaml (6.8 KB)
assuming txt = "GRAND TOTAL (I + II + III);470,957,563;57,606,092;413,351,471;409,692,987;GRAND TOTAL (I + II + III + IV + V);413,351,471;409,692,987"

  1. assign system.Text.RegularExpressions.Regex.Matches(txt, "([\d]+)") to regexCollection variable
  2. loop regexCollection variable

result
image

thank you for your fast reply.

So this method will only work if I want to do so on all the rows I have. However, I only want to retrieve data from this row only, so I think Regex is a possible solution, but I’m facing an issue while doing so

Okay @mounir.mohsen

Then let’s first retrieve the line alone with the new line split & a matching keyword

I guess Grand Total can be taken as a keyword for matching to find the line ? Is that correct ?

Thanks
#nK

Yeah, we can do so. What I had in mind is something like this (?<=GRAND TOTAL) then the rest of the row

Thank you for your reply. I tried this regex but it returns the all the numbers in the file. What I want is this line and the highlighted part only in the image.

So if we are using Regex it would be something like (?<=GRAND TOTAL) then skip the fist two variables between ;470,957,563;57,606,092; and return the last two 413,351,471 409,692,987

try this @mounir.mohsen
Sequence.xaml (8.1 KB)

result
image