How to write regex for extracting particular data which is above the word.?

Input:

Senior Audit Manager, Corporate Audit: xxx
ABC audit
January 26,2023

Table of contents

Output:

ABC audit
January 26,2023

i need to extract the 2 lines of data in the middle of the sentance.how can it be do it from regex?

Hi @yashashwini2322

Use the below regex expression
image

Hope it helps!!

Hi

System.Text.Regex.RegularExpressions.Match(“YourString”,“[A-Z]+\s[A-Z]*[a-z]+\s+[A-Z]+[a-z]+\s\d+,\d+”)).Value

Thank you

Hi @yashashwini2322

Try this

(?<=\n)(.*)(?=\n)

Hi @yashashwini2322

if you need both as single exp

If you need both with 2 expressions separately
Use for each

Hope it helps!!

Hi @yashashwini2322 ,

We need to understand the data format and what are the constant words present in it that we can anchor too and get the required value.

Do we have data before the Sample data provided or is it the start of the data, do you require the values starting from the Second line ? Do you require the 2 lines above Table Of Contents ?

Understanding these requirements will help us give you the accurate suggestion.

@yashashwini2322

If you want it in a same line

New line

Hope it helps!!

Hi @yashashwini2322
Use the below regex expression

(?<=\:\s+[a-z]+\s+).*\s+.*(?=\s+)


Regards

Yeah…there will be number of lines but I want the data which is above the table of content

@yashashwini2322 ,

In that case, could you check with the below Regex Expression :

(.*\r?\n){2}(?=(\r?\n)*Table of contents)

What if i need to extract the next below lines(line numbers not constant) untill the next new line space.

@yashashwini2322 ,

With an Example data we can be more sure of what is that you are referring to.

Corporate

Feedline

Responsible executive:
Sidney dfg
Cioo, com bank
Tech & op

Audit abcdef

Repo


From the above I want to extract

Responsible executive:
Sidney dfg
Cioo, com bank
Tech & op

Untill the new line space I need to extract

@yashashwini2322 ,

Could you let us know what are the fixed keywords here that we can refer to for anchoring like we had Table of contents for the previous data sample.

Some times it’s
Responsible executive
Sometimes it’s
Unit head/Responsible executive
And sometimes it’s only
unit head

But the untill like is constant that means untill
Audit fieldwork duration
Need to extract

@yashashwini2322 ,

Considering the above points, the regex pattern is defined below :

(?<=(Responsible executive|Unit head:/Responsible executive|unit head):?\r?\n)[\S\s]+?\n(?=\n)

Expression :

System.Text.RegularExpressions.Regex.Match(Strvariable,"(?<=(Responsible executive|Unit head:/Responsible executive|unit head):?\r?\n)[\S\s]+?\n(?=\n)",System.Text.RegularExpressions.RegexOptions.Multiline).Value

Debug Visuals :
image

Let us know if this is not the expected solution.

is input format always constant or it will change ?

Between Responsible executive and audit
Only 3 lines data is populating… I need the complete data which is between Responsible executive and audit fieldwork duration

@yashashwini2322 ,

In that case, change the Regex Expression to the below :

(?=(Responsible executive|Unit head:/Responsible executive|unit head):?)[\S\s]+?(?=Audit fieldwork)

Thank you so much…it works

1 Like