How to extract substrings from a string using regular expressions

Hi All,

I have a string in below format. The values and spaces between each string are dynamic. Need help in extracting the values and storing them in different variables.

For instance,

  1. In Complete 02/25/2023 23653874109 ($700.00) 13 Care and Care Plan
  2. Complete 09/05/2021 67234906513 $0.00 1034 Our-Health Schedule
  3. Pending for Review 11/12/2021 73481250986 $1253.30 28 Child Health, Health Care Plan

The above lines will be processed individually. For every line the expected output is in below format.
variable 1 = should have all words before date (i.e In Complete)
variable2 = date (i.e 02/25/2023)
variable3 = ID (i.e 23653874109)
variab1e4 = amount (i.e ($700.00))
variable5 = number (i.e 13)
variable6 = should have all the words after the number(i.e Care and Care Plan)
Same for the rest of the lines

Thanks in advance…

1 Like

Hi @kavya_mamidishetti

- Variable1 = System.Text.RegularExpressions.Regex.Match(Input,"^[\w\s]+(?=\s+\d+\/\d+\/\d+)").Value
- Variable2 = System.Text.RegularExpressions.Regex.Match(Input,"(\d+\/\d+\/\d+)").Value
- Variable3 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\d+\/\d+\/\d+\s+)\d+").Value
- Variable4 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\d+\/\d+\/\d+\s+\d+\s+)\(?\$+\d+\.?\d+\)?").Value
- Variable5 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\(?\$+\d+\.?\d+\)?\s+)\d+").Value
- Variable6 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\(?\$+\d+\.?\d+\)?\s+\d+\s+).*").Value

Hope it helps!!

2 Likes

Hello @kavya_mamidishetti

Check the below workflow for better understanding,
2024.xaml (21.4 KB)

Required Output -
image

Variable1 -

Variable2 -

Variable3 -

Variable4 -

Variable5 -

Variable6 -

Hope it helps!!

1 Like

I hope you find the solution for your query, If yes Mark my post as solution to close the loop.
Else, If you have any queries or clarification, let me know… @kavya_mamidishetti

Happy Automation!!

Hi,

Can you try the following sample using Regex.Group?
This sample output not only each value but also excel table.

mc = System.Text.RegularExpressions.Regex.Matches(strData,"(?<=^|\n)(?<STATUS>.*?)\s+(?<DATE>\d+/\d+/\d+)\s+(?<ID>\d+)\s+(?<AMOUNT>\S+)\s+(?<NUMBER>\d+)\s+(?<NOTE>.*)")

Then we can access each value like m.Groups("DATE").Value

Sample
Sample20240128-1.zip (7.9 KB)

Regards,

2 Likes

@mkankatala I appreciate your efforts. The syntax for variable3 i.e for amount it is failing to extract the amount when the digits increased for the amount. For example, ($7,901.00) , ($12,501.00). Since, data is dynamic I’m not sure of the length of digits length. Would you mind looking at please?

Thank you,

Hi @kavya_mamidishetti

For Variable4 can you replace it with the below syntax. Variable4 is the variable of amounts extraction:

- Variable4 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\d+\/\d+\/\d+\s+\d+\s+)\(?\$+\d*\.?\,?\d*\.?\d*\)?").Value

Hope it helps!!

@mkankatala It is able to extract the amount now but variable6 syntax is not working in some cases to extract the number. Please find below screenshot for your reference. Unable to extract number 33 with the current syntax.
Thank you,

@Yoichi Thank you for your work. Will check and get back to you.

Sorry there is change in the Price right ($7,382.00) according to this I have made changes to that regular expressions, check below one now these work for Variable5 and Variable6.

- Variable5 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\(?\$+\d+\.*\,*\d*\.*\d+\)?\s+)\d+").Value

- Variable6 = System.Text.RegularExpressions.Regex.Match(Input,"(?<=\(?\$+\d+\.*\,*\d*\.*\d+\)?\s+\d+\s+).*").Value

If there is any change needed let me know.

Hope you understand!! @kavya_mamidishetti

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.