String Manipulation issue on payroll data

i have been working on String Manipulation to extract Data from word file. the data like
i tried below methods both are not getting exact data.

  1. lastname=System.Text.RegularExpressions.RegEx.Match(payroll_Data, “(?<=Last Name\s+).+”).Value
  2. lastname=payroll_Data.Substring(payroll_Data.IndexOf(“First Name”)).Split(" "c)(2)

i want to extract data :—Firstname,Last name,DOB,PAN Num,Account number,IFSC Code

can anybody help to me

Hi Anand,

for the last name regex expression, drop the first + —> (?<=Last Name\s).+”)

I think you are on the good path with Regex. Just check with Regex101 if they work there

Remember \n will be a new line :slight_smile:

i tried in that but not get data

working with strings retrieved from word often requires some cleansing. It looks like you will get a lot of "\a" in your text, the so called BELL character.
Remove it before doing string extractions (e.g. Regex) with


Also have a look on the line breaks using \r instead of \n

following could work for line breaks done only with \r or \n

just have a check if maybe the windows typically linebreak \r\n occurs, then modified it accordingly (e.g with \s{2})

@ppr in my folder Multiple word files. we can’t remove manually BELL character.
i checked :–>(?<=Last Name\s).+<–: in 101 regex but in uipath its getting whole data .not getting Last name
how can i get particular Data Use Regex ?

may we ask you to share some details to the processed text as it is within the processing. This would help us to adress it in same manner the Bot does get the data.

  • set a break point on relevant place
  • debug and get paused on breakpoint
  • go to locals, identify relevant variable, expand with click on pencil/magnifier
  • share the expanded content view with us

in 101 regex but in uipath its getting whole data

ensure that the same Regex Options are active, especially the multiline option

we can’t remove manually BELL character

when processing with regex, using a replace similar as shown above in the screenshot would do it. Can you please elaborate on what is meant by “remove manually”, Thanks

“remove manually”, mean BELL character in Text File.
there are multiple payroll word files(Same Format) from this i have to extract data. the format is same output data.
Your regular expression is fine and should usually work if the input text is using \n as line breaks. Unfortunately for you, the Payroll.txt seems to be using \r as the only line break character. So you need to modify the expression slightly:

(?<=Last Name\s+)[^\n\r]+

What @ppr suggests is that you clean your input text before using it with RegEx.Match():

payroll_Data = payroll_Data.Replace(chr(7), "")
LastName = System.Text.RegularExpressions.RegEx.Match(payroll_Data, "(?<=Last Name\s+)[^\n\r]+").Value

its working Fine, Thanks@ ptrobot

