I have a regex below that extract data between two strings , but I also want to exclude some text in between for example if there is a text “helloworld” in between two strings then I wanna exclude it. Any idea thanks
System.Text.RegularExpressions.Regex.Match(strInput,"(?<=In the Last 120 Days)([\S\s]*)(?= History)").Value.Trim
@jelrey - rather than making your regex more complicated, I would just do additional processing based on matches you’ve found. If i understand correctly, you want to get the first match that doesn’t contain the text “helloworld”.
In order to do that use the ‘matches’ activity, or use a an assign activity to slightly alter your existing regex from a .match() to a .matches() statement. This will give you a variable of type ienumerable<match> which i’ll call MyMatches
Check to make sure you got at least one loop with a quick if statement: If MyMatches.Count = 0 Then (insert code here to handle this error)
Use a for each loop to iterate through the matches. Make sure to change the TypeArgument to System.Text.RegularExpressions.Match
For each item in MyMatches
Else Assign TextYouWantExtracted = item.value
Now you have a string variable called TextYouWantExtracted that contains the value pulled using your regex that doesn’t contain the word “helloworld”
@Jelrey I apologize I didn’t realize it was your regex that you were having issues with. Thank you for providing the sample input and expected output. That helps a ton when trying to give helpful advice.
Change your regex pattern to be this instead: (?<=In the Last 120 Days)[\S\s]*?(?= History)
I removed the excess parenthesis you had & added a non-greedy operator ?. Use this new regex pattern combined with my answer above to get your preferred text extracted
EDIT: You will also want to trim your match.value to remove the excess space at the beginning that you are grabbing. Or else you can alter the positive lookbehind to to include that whitespace so it is (?<=In the Last 120 Days\s+) instead