How to remove some XML tags using Regex

Hello,

I have a string which has to be XML deserialized.
Between some tags, there are “forbidden” characters like “<”; “>”.
I cannot just replace those characters because I will broke other XML tags.
So, my idea is to Replace the “problem” tag which have the “forbidden” characters.

I will explain below an example:
image

My Idea: System.Text.RegularExpressions.Regex.Replace(test_str,“(?=<a>)[\S\s]*(?:</a>)”, “”)

It will result:
image

And not (this is what I want to obtain):

image

Other idea is to split after “< /tag>” and after that to do the above method.

But there are any ways to do that without spliting? To do this directly using Regex?

Thanks a lot,
Vlad

Hi,

Can you try the following?

System.Text.RegularExpressions.Regex.Replace(test_str,"<a>[\S\s]*?</a>\s*", "")

Regards,

Thanks a lot man, it works.
Can you please explain me the meaning of “?” in your idea?

I know that “?” means 0 or 1 appearences of the previous character but in this example I cannot figure out its meaning.

Thanks again,
Vlad

Hi,

It means lazy matching (shortest matching). Can you check the following document?

Regards,

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.