Can you give us a few examples of text variations and requirements for the extraction, for us to better understand which part you want to extract?
Or is this a single use case/requirement?
its not a single use case. I want to extract the between the second β_β and the first β;β as bold below.
2_22012056_Clearing MKSA Jul26; Jul25;
Some text variations example:
2_22012056_Clearing MKSA Jul26; Jul25;
3_22012078_Clearing SA Aug23; Aug23;
4_22012078_Clearing MK May24; May24;
But based on the screenshot, the extracted text should only be the yellow highlighted right? as i already use selected group in the regex. Iβm having confusion with this part.
(?<=\d{2}_) β Start matching after two digits and an underscore
.+? β Capture the required text
(?=;) β Stop matching before the semicolon
This uses lookbehind and lookahead, so unwanted parts like 56_ or ; are not included in the result.
Thatβs why it returns exactly βClearing MKSA Jul26β, unlike the earlier regex where extra text was captured even when using groups.
Thank you for the suggestion, it doesnt get the full output as i wanted because it does not extracted until the first semicolon(;). Nevertheless i was able to tweak a little bit and got the expected output.
The .+? part works because the question mark makes that part of the expression non greedy.
.+ would give you everthing between the first instance of the lookbehind and the last instance of the lookahead. In Regex language that is called a greedy expression.
Adding the ? makes it a non greedy expression, making it stop matching once it finds the first instance instead of the last instance of the lookahead.