String Manipulation issue on payroll data

hi,
i have been working on String Manipulation to extract Data from word file. the data like
nt Details

aFirst Name
aY
aLast Name
aAstha
a
a
aDate of Birth
a15-10-1993
aEmp. No.

i tried below methods both are not getting exact data.

  1. lastname=System.Text.RegularExpressions.RegEx.Match(payroll_Data, “(?<=Last Name\s+).+”).Value
    and
  2. lastname=payroll_Data.Substring(payroll_Data.IndexOf(“First Name”)).Split(" "c)(2)

i want to extract data :—Firstname,Last name,DOB,PAN Num,Account number,IFSC Code

can anybody help to me

Payroll.txt (1.3 KB)

Hi Anand,

for the last name regex expression, drop the first + —> (?<=Last Name\s).+”)

I think you are on the good path with Regex. Just check with Regex101 if they work there

1 Like

Hello

Remember \n will be a new line :slight_smile:

i tried in that but not get data

@Anand_Designer
working with strings retrieved from word often requires some cleansing. It looks like you will get a lot of "\a" in your text, the so called BELL character.
Remove it before doing string extractions (e.g. Regex) with
strText.Replace(chr(7),"")

grafik

Also have a look on the line breaks using \r instead of \n

following could work for line breaks done only with \r or \n
grafik

just have a check if maybe the windows typically linebreak \r\n occurs, then modified it accordingly (e.g with \s{2})

@ppr in my folder Multiple word files. we can’t remove manually BELL character.
i checked :–>(?<=Last Name\s).+<–: in 101 regex but in uipath its getting whole data .not getting Last name
how can i get particular Data Use Regex ?

@Anand_Designer
may we ask you to share some details to the processed text as it is within the processing. This would help us to adress it in same manner the Bot does get the data.

  • set a break point on relevant place
  • debug and get paused on breakpoint
  • go to locals, identify relevant variable, expand with click on pencil/magnifier
  • share the expanded content view with us

in 101 regex but in uipath its getting whole data

ensure that the same Regex Options are active, especially the multiline option

we can’t remove manually BELL character

when processing with regex, using a replace similar as shown above in the screenshot would do it. Can you please elaborate on what is meant by “remove manually”, Thanks

“remove manually”, mean BELL character in Text File.
there are multiple payroll word files(Same Format) from this i have to extract data. the format is same output data.
for understanding purpose Detail i have shared the below file. Payroll.txt (1.3 KB)

Your regular expression is fine and should usually work if the input text is using \n as line breaks. Unfortunately for you, the Payroll.txt seems to be using \r as the only line break character. So you need to modify the expression slightly:

(?<=Last Name\s+)[^\n\r]+

What @ppr suggests is that you clean your input text before using it with RegEx.Match():

payroll_Data = payroll_Data.Replace(chr(7), "")
LastName = System.Text.RegularExpressions.RegEx.Match(payroll_Data, "(?<=Last Name\s+)[^\n\r]+").Value

See attached file for an example. PayrollExtract.xaml (9.0 KB)

its working Fine, Thanks@ ptrobot

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.