RegEx Matches Pattern Issue

Evening

An extract of the Text I am trying to parse - as identified by UiPath is as follows:-
–@“URL’S TO BLOCK


–<p class=”“MsoNoSpacing”“>

–<span style=”“”“><span style=”“font-family:"Arial",sans-serif; color:black”“>gsttnhsuk.nicepage[.]io


–<p class=”“MsoNoSpacing”“><span style=”“”“><span style=”“font-family:"Arial",sans-serif; color:black”“>jeromebrooks[.]com


–<p class=”“MsoNoSpacing”“><span style=”“”“><span style=”“font-size:12.0pt; font-family:"Arial",sans-serif; color:red”“> 


–<p class=”“MsoNoSpacing”“><span style=”“”“><span style=”“font-size:12.0pt; font-family:"Arial",sans-serif”“>”

Note the preceding – are to ensure the lines are all shown in this question

I am trying to extract the two websites
gsttnhsuk.nicepage[.]io
and
jeromebrooks[.]com

For simplicity, I am trying to extract the text after
color:black
I can manipulate any surplus characters

My Matches Pattern in UiPath has
“/(?<=color:black)(.*)(?=</span></span>)/gm”
and this basically works fine in regex101.com

Can anybody offer me a reason why UiPath does not return anything - I have a enumerable match variable as the result from the Matches object, but it appears to return nothing

Hello @gary.cobden

Your pattern is not matching in UiPath.

Regex101.com is not 100% suitable/compatible with UiPath. It’s fine for most scenarios but if you are using lookaheads then don’t trust Regex101 fully.

Try using .NET Regex Tester - Regex Storm

Another thing to do is to run Studio in debug mode and insert a breakpoint using F9 at the matches activity.

When the robot stops at the breakpoint open the ‘Locals’ panel (left hand side) and check the variables. Then open the ‘Immediate’ panel and use the regex syntax to test your pattern and additional patterns until you get the correct result.

Start with this:
System.Text.RegularExpressions.Regex.Match(YOURSTRING, REGEXPATTERN).ToString

Take a look at this Regex Pattern based on your sample above - preview it here. I don’t think you needed the second Lookahead as that was on a different line and “.*” was sufficient to capture the website. There was also some extra characters after the word black which was breaking your pattern.

Hopefully this helps :blush:

Cheers

Steve

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.