Regex match pattern up to but exclude optional

Hello I only know basic regex and need some help figuring this out:

I’m extracting the addresses in a txt file where the line starts with:
Service Address: ACB st, City, Province, Postal code
the issue is that some of these lines have this after the address itself: Page 1 / 1 (and this can be Page 2/2 or Page 3/3 etc. but not all address lines have these.)
I need the regex to extract the addresses only so the match should stop before “Page”. This expression below matches everything including the page part, can someone please show me how to modify it to exclude the page part?
Service Location:(.+)

thanks!

Try this as your pattern. It will only keep the address part. Make sure you add .trim at the end.
"(?<=Service Address:)[\s\S]+(?=Page)"

By the way, you put Service Location and Service Address, so not sure which one you needed in the regex pattern.

Hey @lynnsong986

You are in the right place for support. There are many people who can assist you :smiley:

I will do my best to assist you.

If you are new to Regex please check out my Regex MegaPost.

Can I recommend next time posting a real sample(deidentified ofcourse), expected output and information on the pattern.

I believe I have a pattern for you. You can preview it here.

To answer your question - your pattern “Service Location:(.+)” is greedy and will “grab” everything on the line including the words “Service Location:” and “Page 1”.

My Regex Pattern:
(?<=Service Address:).*(?= Page \d)

My pattern will look for an anchor and then start AFTER the words “Service Address:” and will grab everything on the same line until it reaches the word “Page 1” or “Page 2”.

Start after Anchor example
Stop at word Anchor example

thank you both for your help! May I know what ?<= and ?= mean in the expression?

I must’ve have missed something in my first posting, tried both expressions but they don’t generate the matches I expected. here is some sample relevant lines:
Service Location: 36 BLUE JAYS WAY (SUITES), TORONTO Page 1 / 2
Service Location: 3600 HIGHWAY 7 WOODBRIDGE ON L4H 0A0
Service Location: 2118 BLOOR ST W (BULK), TORONTO Page 1 / 1
your help is highly appreciated!!

taken from rege101.com:

its like anchoring after / before on a pattern

1 Like

Have you tried replacing the Service Address to Service Location in the regex pattern? Try this:

(?<=Service Location:).*(?= Page \d)

Essentially, they are anchors and the < characters mean before or after the pattern within the brackets.

Regex Anchors look for a word to attach to.
The < means before or after the word its attaching to.

See my examples:
Start AFTER word Anchor example
Stop AT word Anchor example

Regex101.com is a great to learn. Check out the Quick Reference section in the bottom right :slight_smile:

Thank you for the samples.

Can you please highlight the expected output.

In line 2, it looks like there is no “Page” which means we need to update our pattern.

Check out this pattern. It doesn’t need the “Page” but will stop if it finds it.
Regex Pattern:
(?<=Service Location:).*(?= Page \d)|(?<=Service Location:).*(?=[\r\n])

(?<=Service Address:).*(?= Page \d)

give a try on following pattern and get also text from multiple lines:

You people are incredibly awesome! I’ve learnt so much from you!! I would mark all of your answers as solution if I could! and sorry I had “address” instead of “location” in my first posting, which screwed up the expression…

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.