Regex for in between Text

Hi,
i have a text like

SYDNEY - NEW YORK

alot of dynamic text 1

NEW YORK - SYDNEY

alot of dynamic Text

NEW YORK - LONDON

alot of dynamic Text

LONDON - SYDNEY

alot of dynamic Text

I need to extract connection with alot of dynamic text under it.

result should be like
result1:
SYDNEY - NEW YORK

alot of dynamic text 1

result 2:
NEW YORK - SYDNEY

alot of dynamic Text

result3:
NEW YORK - LONDON

alot of dynamic Text

Any idea how to do that?

This is one of those times my first question would be: can we get the data in a better format? Always analyze and improve your processes, if possible (maybe it isn’t), rather than automating bad processes.

Hi @Robott

You can achieve what you need using a Regular Expression (Regex) :smiley: You can learn Regex from my MegaPost. I would strongly recommend looking at Section 5.

You have provided a Sample, expected Output but no information on the Pattern.

To make a reliable Regex Pattern you must understand the pattern within the text.

  • What is consistent? What changes?
  • Is it always capitals and a dash?
  • How is it generated? System or OCR?
  • Will there be an opportunity to validate the result? (You could create a white-list of all the Cities in the world maybe).

The good news is I have created a Regex Pattern based upon your sample. However it may not be perfect :thinking: You may want to ask a few questions and see what you can find out :slight_smile:

I have deduced that the pattern is something like this:

  • MUST be the start of a line.
  • MUST finish at the end of a line.
  • MUST only contain capital letters and spaces separated by a dash.
  • If it doesn’t have ALL these things then IT WILL NOT MATCH.

You can preview/play with the Regex pattern here.

Hopefully this helps :blush:

@Robott

Try this :

Regex.Matches(“input String”,"[\w].+(-) [\w].+").ToString