Replace text in a string using wildcard operators

I have a string of text where I need to remove certain text and character elements (HTML elements specifically). I am able to use Regex.Replace for items that are consistent, but I have a need to use the asterisk* wildcard.

Here is a sample string of the text input:

As mentioned, I can use Regex.Replace to remove the li, i, /span, etc., but I would also like to remove all the span style… items. Since they are all different I need to use some kind of wildcard but I have been unable to find a way to use the asterisk* wildcard.

Here is my current code that gets rid of the consistent elements:

Regex.Replace(txtDescription,"<ol>|<li>|<i>|</i>|</li>|</span>","")

Ultimately I want the output to be “This is the only text I actually want to appear after cleanup”. Is it possible to use the asterisk* wildcard with Regex.Replace?

it the majority of cases it more recommendable to use the XML Api approach for working with html code. Let us know in case of you nee more help on this. If possible also provide some text sample for us. Thanks

Hi @jesse.paplanus

In addition to @ppr’s post. Some samples, expected output and information on the pattern would be ideal.

For example, if you sample looks like this:
ABCDEFGHIJL. I want to keep this text. MNOPQRSTUVWXYZ.

You have two options to clean this string up dynamically.

  1. Obtain “I want to keep this text” with one Regex pattern and assign to new variable.
  2. Two Regex patterns to clean up “ABCDEFGHIJL” and “MNOPQRSTUVWXYZ”. Leaving behind: “I want to keep this text”.

In both instances we need to know a pattern to build a robust Regex Pattern.

In the meantime, here is a option 1 Regex pattern with an idea of what we need to build a robust pattern. What text anchors (if any) are there either side of the text you want.

Hi @ppr, thank you for the reply. I provided a sample of the string I am trying to strip the HTML tags from in an image included with my original post (I couldn’t just copy and paste it since the browser applies the tags). Let me know if that does not show up on your end and I can try to upload it as a txt file.

@Steven_McKeering, thank you as well for the reply. Unfortunately neither of those would ultimately work because the layout is not consistent. Ultimately what I am trying to do is use Regex in a “find & replace” function to find all instances of the HTML tags to then strip them out. Due to the <span…> not being the same every time, I am unable to pluck it out like I can the others so that’s why I was hoping there is a way to use the asterisk wildcard.

@ppr and @Steven_McKeering, here is a sample of the text showing the full input that I would like to remove all HTML tags from. Since I am a new user to this forum it is not letting me upload attachments so I am uploading an image. I am using Regex.Replace as a find & replace function replacing all the tags with “”. As you can see, all of the <span…> tags are not the same. Not sure what I can do since they are not consistent like all the other tags.

Hello

Take a look at this pattern. It might be a start.

Pattern:
\<[^>]+\>

1 Like

This works best, thanks! I had several actions in my sequence that would remove certain elements each time, but this solution took care of all of it in one command.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.