Regex for string but have 2 duplicated keywords

Hi everyone

I am stuck with on how to solve this issue. Kindly need your advise and help.

I want to get the info in BOLD:

  1. The “Diesel -” if I am not mistaken is like constant it will always be there. I want to get the info after this keyword which is like all letters (Some times it can have special characters for ex. “MAN B&W”
  2. String “2-stroke” between “Wartsila” and “7RT-flex68D” is not always there plus I dont need it. But the bold “2-stroke” is always there but it can sometimes be “4-stroke”. This is for the ME.Info variable. There is always a “-” before “Warsila” which is the Manufacturer name and after “7RT-flex68D” which is the Model Name. The Model Name is a mix of Number, Letter and special character like “-”

I have assign:

ME.Info = System.Text.RegularExpressions.Regex.Match(str_Equip_Details.ToString,“(?<=MAIN ENGINE\s*).*(?=AUXILIARY)”).ToString

Which will extract everything between MAIN ENGINE and AUXILIARY. Then I will extract the bold info.

I want to do the same method for bold info after keyword AUXILIARY, but it seems there is 2 duplicated keywords (Which is “AC generator(s)”)

For variable of the whole text

str_Equip_Details =

MAIN ENGINE 1 x Diesel - Wartsila 2-stroke 7RT-flex68D - 2-stroke 7-cyl. 680mm x2720mm bore/stroke 21,910mkW total at 95rpm.AUXILIARY 2 x Aux. Diesel Gen. - MAN Energy Solutions 9L21/31 60Hz - 4-stroke 9-cyl. 210mm x 310mm bore/stroke 3,960mkW total at 900rpm driving 2 x AC generator(s) at 3,742ekW total, (5,088kVA total) 440V at 60Hz. 2 x Aux. Diesel Gen. - MAN Energy Solutions 8L21/31 60Hz - 4-stroke 8-cyl. 210mm x 310mm bore/stroke 3,520mkW total at 900rpm driving 2 x AC generator(s) at 3,326ekW total, (4,157kVA total) 440V at 60Hz.PROPULSOR 1 x FP Propeller (Aft Centre) (mechanical), 95rpm.POS, PROPULSOR 1 x CP Pos, Tunnel Thruster (Fwd.) (electric), Kawasaki KT-130B3, 1,160rpm at 1,100ekW total, 440V ac.OTHER ENGINE EQUIPMENT 1 x Screw Shaft.CARGO EQUIPMENT 188 x Sockets, Reefer - (Hold) 440V at 60Hz, 3-phase. 348 x Sockets, Reefer - (Deck) 440V at 60Hz, 3-phase.ENVIRONMENTAL EQUIPMENT 1 x BWTS - Ballast Water Treatment System - Optimarin Optimarin at 600cu.m/hr.LIFTING EQUIPMENT 3 x Crane - MacGregor (Midships) SWL 45 tons at 30m…EMERGENCY 1 x Emergency Diesel Gen. - Agco Sisu 645 DSBAG - 4-stroke 6-cyl. 111mm x 145mm bore/stroke 220mkW total at 1,800rpm driving 1 x ac generator(s) at 60Hz.

Hi @Irfan_Musa ,

Could you let us know if only the first two matches between Diesel keywords are supposed to be considered ? Or do we have many such matches to consider ?

First Trying to group the required sections (Diesel to Diesel), we can get like below :

(?<=Diesel\s*).+?(?=Diesel)

So, from your text we get 3 matches that suits the pattern, if only the first two matches are to be considered, we can extract only those and get it extract relevant details from it separately.

Hi @supermanPunch, yes the data I needed is from the first 2 matches. Need to do further regex to get the bolded data.

@Irfan_Musa ,

There are a lot of variations that it needs to be tested with and examples and conditions more to be accurate to suggest you with more accurate regex expression.

Is it possible to provide varying data samples, so that most/all of the cases could be tested with the regex.

Initial Testing (For the First Match Individual Data extraction) :

(?<=Diesel\s-\s*)(\w+)\s*(.*?)?(\s*[\w\-]+)\s*-\s*(2-stroke|4-stroke)

But we also see that it fails in the last to identify properly due to the condition mentioned :