RegExpression: Extract text before blank line or a particular symbol

Hi

I have many text files with lots of text.

  1. I want to extract all the lines starting with the variable
    \n\s{3}[A-Z]{2}\d{4}.[A-Za-z]{3}
    Basically something like the below

AB2003.Jan -097
CD2008.Jul -0.778
AR2009.Jan -0.123

  1. But sometimes, they might be similar text that is preceded by “Auto”. I don’t want those

Auto
GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123

Variation 1:

AB2003.Jan -097
CD2008.Jul -0.778
AR2009.Jan -0.123

Some other text

Auto
GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123
line with “--------------------------------------------------------------------”
Some other text

Variation 2:

AB2003.Jan -097
CD2008.Jul -0.778
AR2009.Jan -0.123
line with “---------------------------------------------------------------------”

Some other text

  1. So basically, I want to extract all the lines

AB2003.Jan -097
CD2008.Jul -0.778
AR2009.Jan -0.123
that is NOT preceded by the word “Auto”. After these lines, it can be either a
a) blank line or
b) -------------------------------------------------------------------

I tried this, but it doesn’t work:

(System.Text.RegularExpressions.Regex.Match(Text1,“(?!Auto.)\n\s{3}[A-Z]{2}\d{4}.([A-Za-z]{3}|\d{1})\s{3,}(.|\n).+((?=\n.--------------------------------------------------------------------)|(?=\n)|(?=Auto.))”).Value)

Basically, the problem with my RegExpresssion is that because sometimes there are the text

Auto
GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123
line with “--------------------------------------------------------------------”

so it would extract till after “Auto”.

GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123

Thank you

So you just want, @Anonymous2
GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123
These above text from above text lines?

Check this below regex, @Anonymous2
[A-Z]{2}[0-9.].*
Hope this may help you :slight_smile:

Hi

Thank you for the reply. Can u read the 1st post? This is want I do NOT want:
GH2003.Jan -097
JL2008.Jul -0.778
HG2009.Jan -0.123

as it is after “Auto”

The Regex expression should only extract till

AB2003.Jan -097
CD2008.Jul -0.778
AR2009.Jan -0.123

Hi @Anonymous2

I have the solution for your issue for that you have to apply 2 Different Regex

Steps :-

  1. First you will apply this regex :point_right: .+(?=Auto)
    by just enabling 2 Regex Flags i.e Global & Single Line
    image
    From which you will get the below ScreenShot as Output
    image

  2. So you will store this output in some variable

  3. Then you will be having the data shown in the below Screenshot that is stored in that variable
    image

  4. So now you will apply this Regex on that 3rd step data :point_right: \b\w.+(-\d{3}|-0.\d{3})\b
    So that now you will get your respected data which you want
    image

Hope this steps will help you to get the solution

Happy Automation

Best Regards
Pratik Wavhal

Try this Regex solution:
(?<=Auto\n)([.\s\S])\n(?=-)|(?=\s\n)
It should stop before either a blank new line or —————