Regex to remove lines containing fixed keywords by keeping the texts in-between

Hi Team,

We have dynamic text which may contain below type of Data.
Starting Identifier keyword : block
End Identifier keyword : #blockend

We have to remove below -
a) 2 lines which contains above 2 keywords
b) Opening and closure of round brackets. Round bracket will start from next line of the keyword ‘block’ and end in the same line/1 line before which contains the keyword ‘#blockend’.

Input String -

‘’’ Block Questions
SPAB “” block fields
(

    SP1 "How old are you?<br/><span class='none' numerickeypad='true'></span><span class='mrInstruct'>(Please enter your exact age below.)</span>"
        style(
            Width = "3em"
        )
    long [18 .. 65];

    SP2 "Have you seen this ad on TV/at the cinema before today?"
        [
            metatype = "rowpicker",
            grid$all$width = 100,
            row$all$center = false,
            row$hovercolor = "#aadeee"
        ]
    categorical [1..1]
    {
        _1 "No, this is the first time",
        _2 "Once or twice",
        _3 "A few times",
        _4 "Lots of times"
    };

    SP3"You have completed the survey. Thank you for your time."
    info;

);
'#blockend

Output String -

SP1 “How old are you?
(Please enter your exact age below.)
style(
Width = “3em”
)
long [18 … 65];

    SP2 "Have you seen this ad on TV/at the cinema before today?"
        [
            metatype = "rowpicker",
            grid$all$width = 100,
            row$all$center = false,
            row$hovercolor = "#aadeee"
        ]
    categorical [1..1]
    {
        _1 "No, this is the first time",
        _2 "Once or twice",
        _3 "A few times",
        _4 "Lots of times"
    };

    SP3"You have completed the survey. Thank you for your time."
    info;

Hi @DewanjeeS - Requirement is not clear… Could you please provide little more details?

I can see the blockend but i dont see the starting word block…

Could you please share your complete input and expected output(If possible highlight?

@DewanjeeS - Please check the below output from Group1, 3 and 4. If you combine these outputs, you get a desired output…

Again, based on the texts provided I have come up with this Regex…it may/may not work for dynamic cases…

@prasath17 … block is there at the second line of the input string. it is ‘block’ in small letters (before the keyword ‘fields’).

  1. Need to delete the entire line containing the keywords ‘block’ and ‘#blockend’.
  2. Also need to remove opening round bracket (will be there in the next line of keyword ‘fields’) and closure of round brackets (will be the same line/one line above of the keyword ‘#blockend’).

@DewanjeeS - why SP1 alone carries this …<br/><span class='none' numerickeypad='true'></span><span class='mrInstruct'>, not SP2 and SP3.???

I highly doubt that , provided Regex above will work only if the texts follows the same pattern…

@prasath17 …SP1, SP2, SP3 etc. will be carrying different types of data. Some may contain table data, some will carry other types of data. Only thing fix is that starting keyword ‘block’ and end keyword ‘#blockend’.

@DewanjeeS - Problem is we also have to ignore </span> after the )…which again creates the problem.

(Please enter your exact age below.)</span>"

So only based on the other questions, we have to revisit the Regex and add OR conditions…looks like I feel this is complex…

BTW…did you see the Regex provided above? and tried?

@prasath17 …I tried the pattern you shared… Unfortunately it is not working as I am expecting :frowning:

@DewanjeeS - Please find the xaml and the output here Regex_DS.zip (36.3 KB)

1 Like

@prasath17 … The pattern is working fine for the input which I mentioned here.
But the problem is - since the content inside both the rounded brackets will be dynamic, so I tried testing by altering the sequence of SP1 and SP2 :: it failed there.
Is it possible to get such pattern which will be able to remove only 4 lines (block, round bracket opening, round bracket close, #blockend) and keep the dynamic content inside it intact?

@DewanjeeS - I don’t think that possible, if the pattern is changing between each questions…I think this we have discussed already…

I would suggest to do a string manipulation and do a necessary clean up and get a consistent pattern before trying Regex…

@prasath17 … If we consider the 1st part of the match, say for example from the input provided, if we would like to fetch -

SPAB “” block fields
(

Can we build any regex to match such pattern? Please consider the keyword ‘fields’ sometimes can be 1/2 lines below of the keyword ‘block’ as well. But start of round brackets will always be the next line of keyword ‘fields’.

@DewanjeeS …sorry I didn’t get you…can pls show me by providing expected output with exalple…

@prasath17 … let me elaborate you… from the above input my main purpose is to -
a) Delete the line containing keyword ‘block’.
b) 2 lines containing opening and terminating round brackets. Round brackets will always start in the next line of keyword ‘fields’ and it will always terminate in the line containing keyword '#blockend.

So, if we consider the Input String as -

‘’’ Block Questions
SPAB “” block fields
(

    SP1 "How old are you?<br/><span class='none' numerickeypad='true'></span><span class='mrInstruct'>(Please enter your exact age below.)</span>"
        style(
            Width = "3em"
        )
    long [18 .. 65];

    SP2 "Have you seen this ad on TV/at the cinema before today?"
        [
            metatype = "rowpicker",
            grid$all$width = 100,
            row$all$center = false,
            row$hovercolor = "#aadeee"
        ]
    categorical [1..1]
    {
        _1 "No, this is the first time",
        _2 "Once or twice",
        _3 "A few times",
        _4 "Lots of times"
    };

    SP3"You have completed the survey. Thank you for your time."
    info;

);  '#blockend

Expected Output is -

SP1 "How old are you?<br/><span class='none' numerickeypad='true'></span><span class='mrInstruct'>(Please enter your exact age below.)</span>"
    style(
        Width = "3em"
    )
long [18 .. 65];

SP2 "Have you seen this ad on TV/at the cinema before today?"
    [
        metatype = "rowpicker",
        grid$all$width = 100,
        row$all$center = false,
        row$hovercolor = "#aadeee"
    ]
categorical [1..1]
{
    _1 "No, this is the first time",
    _2 "Once or twice",
    _3 "A few times",
    _4 "Lots of times"
};

SP3"You have completed the survey. Thank you for your time."
info;

@DewanjeeS - Check this Regex Pattern…

Idea is to Ignore what starts before any questions…Since all the questions starting with SP* followed by #, I started my pattern with that which automatically deleted the first 3 lines …

Let me know, if this works for you…

@prasath17 … issue here is, all the time question won’t start with SP*. It may be of any pattern in-between those rounded brackets.

Only things constant here is -

a) keyword ‘block’
b) line contains start of rounded brackets (next line of ‘fields’)
c) line contains closure of rounded brackets (same line in keyword #blockend)

@DewanjeeS – I know you are looking for a keyword before the question… By my point, how many different types of questions you are having…I saw qQ, like this there should only few right?? Here you go…