I have a regex below that extract data between two strings , but I also want to exclude some text in between for example if there is a text “helloworld” in between two strings then I wanna exclude it. Any idea thanks
System.Text.RegularExpressions.Regex.Match(strInput,"(?<=In the Last 120 Days)([\S\s]*)(?= History)").Value.Trim
How do we make the regex consider the first text found only ? , for example there are multiple “History” text , I want to only consider the first “History”
@jelrey - rather than making your regex more complicated, I would just do additional processing based on matches you’ve found. If i understand correctly, you want to get the first match that doesn’t contain the text “helloworld”.
In order to do that use the ‘matches’ activity, or use a an assign activity to slightly alter your existing regex from a .match() to a .matches() statement. This will give you a variable of type ienumerable<match> which i’ll call MyMatches
Check to make sure you got at least one loop with a quick if statement: If MyMatches.Count = 0 Then (insert code here to handle this error)
Use a for each loop to iterate through the matches. Make sure to change the TypeArgument to System.Text.RegularExpressions.Match
For each item in MyMatches
If item.value.contains(“helloworld”)
Then Continue
Else Assign TextYouWantExtracted = item.value
Break
End If
Now you have a string variable called TextYouWantExtracted that contains the value pulled using your regex that doesn’t contain the word “helloworld”
"In the Last 120 Days Something happened in History, and other tzhings were cool in history too and In the Last 120 Days and history stuf… and this is History test 2312dsad ytrrytryrt History "
the output should be “Something happened in”
since “Something happened in” is between start “In the Last 120 Days” and end “History” which is the first History text found and the rest will be ignored.
"In the Last 120 Days Something happened in History, and other tzhings were cool in history too and In the Last 120 Days and history stuf… and this is History test 2312dsad ytrrytryrt History "
the output should be “Something happened in” …
since “Something happened in” is between start “In the Last 120 Days” and end “History” which is the first History text found and the rest will be ignored.
@Jelrey I apologize I didn’t realize it was your regex that you were having issues with. Thank you for providing the sample input and expected output. That helps a ton when trying to give helpful advice.
Change your regex pattern to be this instead: (?<=In the Last 120 Days)[\S\s]*?(?= History)
I removed the excess parenthesis you had & added a non-greedy operator ?. Use this new regex pattern combined with my answer above to get your preferred text extracted
EDIT: You will also want to trim your match.value to remove the excess space at the beginning that you are grabbing. Or else you can alter the positive lookbehind to to include that whitespace so it is (?<=In the Last 120 Days\s+) instead