Regex Not Working?

Hi, currently Im using Find Matching Patterns activity to capture text and assign into two variable named HeaderText.

Sample text:
2_22012056_Clearing MKSA Jul26; Jul25;

Expected output for HeaderText;
Clearing MKSA Jul26

Regex Used:
\d{2}_(.+\w);

When i try to output HeaderText in message box, i always got β€œ56_Clearing MKSA Jul26;” eventhough i already use the selected group, why is this?

I already put the variable at first match, in the Find Matching Patterns activity properties in UiPath.

Use this regex
(?<=\d{2}_).+?(?=;)

1 Like

Hi @f4reast

Can you give us a few examples of text variations and requirements for the extraction, for us to better understand which part you want to extract?
Or is this a single use case/requirement?

Regards
Soren

Hi @SorenB,

its not a single use case. I want to extract the between the second β€˜_’ and the first β€˜;’ as bold below.

2_22012056_Clearing MKSA Jul26; Jul25;

Some text variations example:
2_22012056_Clearing MKSA Jul26; Jul25;
3_22012078_Clearing SA Aug23; Aug23;
4_22012078_Clearing MK May24; May24;

But based on the screenshot, the extracted text should only be the yellow highlighted right? as i already use selected group in the regex. I’m having confusion with this part.

Thanks.

Wow this works! Thanks.
Can you explain a little bit about the regex? Im having difficulties trying to understand it

1 Like

your regex is too greedy and UiPath is returning the full match, not just the group.

(.+\w) captures everything up to the last β€œ;”, which is why you get 56_Clearing MKSA Jul26;.

Use this regex instead:

\d{2}\d+([^;]+);

And read the value from:

Matches(0).Groups(1).Value

That will return:

Clearing MKSA Jul26

Hi @f4reast

Please check the below regular expression:

[A-Za-z]+\s*[A-Z]+(?=\s+[A-Za-z0-9]+\;)

Regards
PS Parvathy

1 Like
  • (?<=\d{2}_) β†’ Start matching after two digits and an underscore
  • .+? β†’ Capture the required text
  • (?=;) β†’ Stop matching before the semicolon

This uses lookbehind and lookahead, so unwanted parts like 56_ or ; are not included in the result.
That’s why it returns exactly β€œClearing MKSA Jul26”, unlike the earlier regex where extra text was captured even when using groups.

1 Like

Thank you for the explaination. Now i understand

Hi Parvathy,

Thank you for the suggestion, it doesnt get the full output as i wanted because it does not extracted until the first semicolon(;). Nevertheless i was able to tweak a little bit and got the expected output.

Thanks

2 Likes

Maybe to add to your understanding.

The .+? part works because the question mark makes that part of the expression non greedy.

.+ would give you everthing between the first instance of the lookbehind and the last instance of the lookahead. In Regex language that is called a greedy expression.

Adding the ? makes it a non greedy expression, making it stop matching once it finds the first instance instead of the last instance of the lookahead.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.