Regex match pattern up to but exclude optional

lynnsong986 · November 10, 2021, 10:31pm

Hello I only know basic regex and need some help figuring this out:

I’m extracting the addresses in a txt file where the line starts with:
Service Address: ACB st, City, Province, Postal code
the issue is that some of these lines have this after the address itself: Page 1 / 1 (and this can be Page 2/2 or Page 3/3 etc. but not all address lines have these.)
I need the regex to extract the addresses only so the match should stop before “Page”. This expression below matches everything including the page part, can someone please show me how to modify it to exclude the page part?
Service Location:(.+)

thanks!

Greg_Jacobson · November 10, 2021, 10:49pm

Try this as your pattern. It will only keep the address part. Make sure you add .trim at the end.
"(?<=Service Address:)[\s\S]+(?=Page)"

By the way, you put Service Location and Service Address, so not sure which one you needed in the regex pattern.

Steven_McKeering · November 10, 2021, 10:51pm

Hey @lynnsong986

You are in the right place for support. There are many people who can assist you

I will do my best to assist you.

If you are new to Regex please check out my Regex MegaPost.

Can I recommend next time posting a real sample(deidentified ofcourse), expected output and information on the pattern.

I believe I have a pattern for you. You can preview it here.

To answer your question - your pattern “Service Location:(.+)” is greedy and will “grab” everything on the line including the words “Service Location:” and “Page 1”.

My Regex Pattern:
(?<=Service Address:).*(?= Page \d)

My pattern will look for an anchor and then start AFTER the words “Service Address:” and will grab everything on the same line until it reaches the word “Page 1” or “Page 2”.

Start after Anchor example
Stop at word Anchor example

lynnsong986 · November 10, 2021, 10:59pm

thank you both for your help! May I know what ?<= and ?= mean in the expression?

lynnsong986 · November 10, 2021, 11:04pm

I must’ve have missed something in my first posting, tried both expressions but they don’t generate the matches I expected. here is some sample relevant lines:
Service Location: 36 BLUE JAYS WAY (SUITES), TORONTO Page 1 / 2
Service Location: 3600 HIGHWAY 7 WOODBRIDGE ON L4H 0A0
Service Location: 2118 BLOOR ST W (BULK), TORONTO Page 1 / 1
your help is highly appreciated!!

ppr · November 10, 2021, 11:04pm

taken from rege101.com:

its like anchoring after / before on a pattern

Greg_Jacobson · November 10, 2021, 11:06pm

Have you tried replacing the Service Address to Service Location in the regex pattern? Try this:

(?<=Service Location:).*(?= Page \d)

Steven_McKeering · November 10, 2021, 11:07pm

Essentially, they are anchors and the < characters mean before or after the pattern within the brackets.

Regex Anchors look for a word to attach to.
The < means before or after the word its attaching to.

See my examples:
Start AFTER word Anchor example
Stop AT word Anchor example

Regex101.com is a great to learn. Check out the Quick Reference section in the bottom right

Steven_McKeering · November 10, 2021, 11:11pm

Thank you for the samples.

Can you please highlight the expected output.

In line 2, it looks like there is no “Page” which means we need to update our pattern.

Check out this pattern. It doesn’t need the “Page” but will stop if it finds it.
Regex Pattern:
(?<=Service Location:).*(?= Page \d)|(?<=Service Location:).*(?=[\r\n])

ppr · November 10, 2021, 11:11pm

(?<=Service Address:).*(?= Page \d)

give a try on following pattern and get also text from multiple lines:

lynnsong986 · November 10, 2021, 11:33pm

You people are incredibly awesome! I’ve learnt so much from you!! I would mark all of your answers as solution if I could! and sorry I had “address” instead of “location” in my first posting, which screwed up the expression…

system · November 13, 2021, 11:33pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regex to get the output Help activities , regex , question	7	912	January 8, 2020
Regex query Help activities , regex , question	9	804	January 17, 2020
Using regex to extract a specific paragraph Help activities , regex	1	1252	November 17, 2020
Extract all words before a text containing certain two separate words Help	4	1967	October 29, 2018
Regex everything between 9 digit numbers Activities pdf , activities , studio , question	9	435	July 22, 2023

Regex match pattern up to but exclude optional

Related topics