How to extract data from txt file using regex

Hi,

I have a text file

the data in it is

serial
-ABC
-ABC 1
-DAC: Sub_DRV
DEF E72
etc

if i want to extract each keyword after serial and before DEF E72 using regex

Please provide the solution.

Thanks in Advance

1 Like

@vinjam_likitha Follow below steps

  1. Read text file assume variable as (str_data)

  2. Replace new line with null str_Data.Replace(Environment.NewLine," ")

  3. Apply Regex System.Text.RegularExpressions.Regex.Match(str_Data,“(?<=serial).*(?=DEF)”).ToString.Trim

image

Hi @vinjam_likitha ,

Is this what you were looking for?
image

If so, then you can use this snippet of code to retrieve the data →

System.Text.RegularExpressions.Regex.Matches(str_txtData,"(?<=serial\r?\n|\r?\n)(.*)(?=[\s\S]+DEF\sE72)")

image

KeywordMatches.xaml (5.8 KB)

Kind Regards,
Ashwin A.K

Hi,

Thanks for the solution.
Could you please try this text

872XXXXXXXX : serial open request -BSRF
-BATTERIE BOITIER SUPERVISION RADIO FREQ
-BOITIER SUPERVISION RADIO FREQUENCE ENS
-SMART ANTENNA COVER
-ENJOLIVEUR BOITIER SUPERVISIO RADIO FREQ
-SMART ANTENNA BOX ASSEMBLY
DCU E72

i want each line as one keyword after serial open request(i.e -BSRF,-BATTERIE BOITIER SUPERVISION RADIO FREQ etc) before DCU E72

sometimes BSRF will be in the next line.

Please provide solution for this

Thanks in advance

Hi @vinjam_likitha ,

We could also do the below Operation by two Steps :

  1. First we would have to Get the Data between two Keywords, so we use the following :
keysText = System.Text.RegularExpressions.Regex.Match(textData,"(?<=serial open request)(.*\n)*(?=DCU E72)",RegexOptions.IgnoreCase).Value.ToString.Trim

where keysText is a String variable, which will have the data between the two keywords, textData will be your Input Data

image

  1. Next, as we need the each line separately, we can use Split based on NewLine as below :
KeysList = Regex.Split(keysText,"\n")

where KeysList is a variable of Type Array of String.

This variable will contain each keys as one line i.e as one item in the array
image

Hi @vinjam_likitha
Screenshot (102)
use this regex expression
it will take even if BSRF is in next line

thanks
#sd

Thank you very much all

But what if the text file contains like this

Produits Index Mots clés
BSRF
E81 ou G33
38XXXXXXXXX : closed request
39XXXXXXXXX : open request
872XXXXXXXX : serial open request
-BSRF
-BATTERIE BOITIER SUPERVISION RADIO FREQ
-BOITIER SUPERVISION RADIO FREQUENCE ENS
-SMART ANTENNA COVER
-ENJOLIVEUR BOITIER SUPERVISIO RADIO FREQ
-SMART ANTENNA BOX ASSEMBLY
DCU E72
38XXXXXXXXX : closed request 39XXXXXXXXX : open request
872XXXXXXXX : serial open request
-MODULE ELECTRONIQUE PORTE D ENS
-MODULE ELECTRONIQUE PORTE G ENS
-MODULE ELECTRONIQUE PORTE AR D ENS
-DCU: MAIN_DRV
-DCU: MAIN_PASS
-ENS RH DOOR CONTROL UNIT
-ENS RH RR DOOR CONTROL UNIT
IDB G

here i have serial open request twice

next time if i want to extract keywords from module “electronique porte d Ens” to before “IDB G”.

can we write same regex

do we need to modify because serial open request came twice or it may come thrice.

@vinjam_likitha We would Require you to Provide what should be the Expected Output for the Input data provided.

Since there seems to be two Serial Open Request and Two DCU’s , what is to be Extracted?

And How do you want to Represent the Extracted data

Hi

This is the data in input variable

serial open request
-CHARGEUR SANS
-FIL APPAREIL NOMADE
-WLC

i need to extract data after serial open request which stores each keyword(CHARGEUR SANS,FIL APPAREIL NOMADE,WLC) in the array of string variable

please provide solution

Thanks in Advance.

@vinjam_likitha , We require the Expected Output for the Large Data that you have Provide above.

We have already provided a Solution above for this case :

Hi @supermanPunch ,

I got the output for thanks a lot for that.

But if you see the below expression, we have to extract between “serial open position” and DCU E72 keywords

System.Text.RegularExpressions.Regex.Match(TextFile,“(?<=serial open request)(.\n)(?=DCU E72)”,RegexOptions.IgnoreCase).Value.ToString.trim

but what if their is no end keyword.

Input:

serial open request
-CHARGEUR SANS
-FIL APPAREIL NOMADE
-WLC

output:

{CHARGEUR SANS,
-FIL APPAREIL NOMADE,
-WLC}

Hi @supermanPunch,

I got the solution by using same expression.

1 Like