Basics of Regular Expressions [RegEx], the string manipulation tool

How to use Regular Expressions [RegEx], the string manipulation tool ?

Regular expressions are useful in extracting information from any text by searching for one or more matches of a specific search pattern.

Following are some of the basic list of operations used to frame a RegEx:

Name
Meta character
Description
Example

 

		<p>Anchors &nbsp;</p>
		</td><td colspan="1" rowspan="1">^</td><td colspan="1" rowspan="1">Matches the starting position within the string</td><td colspan="1" rowspan="1"><strong>^The</strong>&nbsp;- matches any string that starts with&nbsp;<em>The</em></td></tr><tr><td colspan="1" rowspan="1">$</td><td colspan="1" rowspan="1">Matches the ending position of the string or&nbsp;<br>			the position just before a string-ending newline</td><td colspan="1" rowspan="1"><strong>end$</strong>&nbsp;- matches a string that ends with&nbsp;<em>end</em></td></tr><tr><td colspan="1" rowspan="6">
		<p>&nbsp;</p>

		<p>&nbsp;</p>

		<p>&nbsp;</p>

		<p>Quantifiers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p>
		</td><td colspan="1" rowspan="1">*</td><td colspan="1" rowspan="1">Matches the preceding element zero or more times</td><td colspan="1" rowspan="1"><strong>abc*</strong>&nbsp;- matches a string that has ab followed by zero or more c</td></tr><tr><td colspan="1" rowspan="1">?</td><td colspan="1" rowspan="1">Matches the preceding element zero or one time</td><td colspan="1" rowspan="1"><strong>abc?</strong>&nbsp;- matches a string that has ab followed by zero or one c</td></tr><tr><td colspan="1" rowspan="1">&nbsp;+</td><td colspan="1" rowspan="1">Matches the preceding element one or more times</td><td colspan="1" rowspan="1"><strong>abc+</strong>&nbsp;- matches a string that has ab followed by one or more c</td></tr><tr><td colspan="1" rowspan="1">{n}</td><td colspan="1" rowspan="1">Matches the preceding element n times</td><td colspan="1" rowspan="1"><strong>abc{2}</strong>&nbsp;- matches a string that has ab followed by 2 c</td></tr><tr><td colspan="1" rowspan="1">{n,}</td><td colspan="1" rowspan="1">Matches the preceding element at least n times</td><td colspan="1" rowspan="1"><strong>abc{2,}</strong>&nbsp;- matches a string that has ab followed by 2 or more c</td></tr><tr><td colspan="1" rowspan="1">{m,n}</td><td colspan="1" rowspan="1">Matches the preceding element at least m and&nbsp;<br>			not more than n times</td><td colspan="1" rowspan="1"><strong>abc{2,5}</strong>&nbsp;- matches a string that has ab followed by 2 up to 5 c</td></tr><tr><td colspan="1" rowspan="3">
		<p>&nbsp;</p>

		<p>&nbsp;</p>

		<p>Operators&nbsp;&nbsp;</p>
		</td><td colspan="1" rowspan="1">[]</td><td colspan="1" rowspan="1">
		<p>A bracket expression that denotes a range as well.</p>

		<p>Matches a single character that is contained within the brackets</p>
		</td><td colspan="1" rowspan="1"><strong>[abc]</strong>&nbsp;- matches "a", "b", or "c" and<br>			<strong>[a-z]</strong>&nbsp;- matches any character between "a" to "z"</td></tr><tr><td colspan="1" rowspan="1">[^ ]</td><td colspan="1" rowspan="1">Matches a single character that is not contained<br>			within the brackets</td><td colspan="1" rowspan="1"><strong>[^abc]</strong>&nbsp;- matches any character other than "a", "b", or "c"</td></tr><tr><td colspan="1" rowspan="1">[ | ]</td><td colspan="1" rowspan="1">Matches either the expression before or the&nbsp;<br>			expression after the operator</td><td colspan="1" rowspan="1"><strong>abc | def</strong>&nbsp;- matches "abc" or "def"</td></tr><tr><td colspan="1" rowspan="2">
		<p>&nbsp;</p>

		<p>Assertions&nbsp;</p>
		</td><td colspan="1" rowspan="1">?&lt;=</td><td colspan="1" rowspan="1">Looks behind the successive element</td><td colspan="1" rowspan="1"><strong>d(?=r)</strong>&nbsp;- matches a d only if is followed by r, but r will not be&nbsp;<br>			part of the overall regex match</td></tr><tr><td colspan="1" rowspan="1">?=</td><td colspan="1" rowspan="1">Looks ahead the preceding element</td><td colspan="1" rowspan="1"><strong>(?&lt;=r)d</strong>&nbsp;- matches a d only if is preceded by an r, but r will not<br>			be part of the overall regex match</td></tr></tbody></table>

Implementation

There are two activities in the UiPath studio to use the RegEx. 

  1. IsMatch - Indicates whether the specified regular expression finds a match in the specified input string, using the specified matching options.
  2. Matches - Searches an input string for all occurrences of a regular expression and returns all the successful matches.

Pattern - The regular expression pattern framed from using the quantifiers, assertions, special characters like (\s, \d, \w) etc.., to match. 

Input - The string to be searched for matches.

RegexOption - A bitwise combination of the enumeration values that specify options for matching which is documented here in MSDN. By default IgnoreCase and compiled is checked.

Examples

  • Extract only the digits from the string

          Pattern : \d+ or [0-9]+ 

                Where \d denotes the digits and [] used to mention the range i.e 0 to 9 

                     + denotes one or more match.

  • Extract a string between two words

          Input:" I am working in UiPath, which is the leading enterprise in the RPA world."

          To extract the company name from the above string, following pattern can be used,

          Pattern : "(?<=(working in))(.*)(?=(, which is))"

               Where (?<=(working in)) → Lookbehind the word "working in"

                          (.*) → Match any character and values other than newline

                          (?=(,which is)) → Lookahead the word ",which is"