Regex - clean string from unwanted characters

What would be the best paterrn for regex to leave only digits, letters, spaces and new lines?

I’m guessing it should be:

System.Text.RegularExpressions.Regex.Replace(myStr, “pattern”,“”) but I don’t know the right “pattern”.

This myStr come from OCR (scan) and there could be a lot of characters like . , & ^ % and many others.

Hi @Yameso …You can try the below pattern.


Please add all the character with in square brackets…


Use the below Regex to only retain Digits, Letters, Spaces and NewLines:

This identifies any other character than ones mentioned above. You can replace it using null string.


Check as below


I add the : , remaining it will identify

Use Regex.Replace(Your String, @“[^0-9a-zA-Z:,]+”, “”)

Hope this may help you


There’s a double back slash before s. It should be two \s before the letter s. It is being automatically adjusted to one while typing the expression here.

I think that would work [^A-Za-z0-9\s]+

But I require to leave also letters with accents like “Ę” “ę” and other polish characters. is there other way to leave them too except this:


1 Like

Check this:



1 Like


Here you can use ASCII encoding instead of Regex.
This will simply replace any Es with normal E, and same goes for other characters too.

Use in a Assign:

I used:


and this one leaves marks such as " ? * , . ( ) and few others, and change letters from “Ę” to “E”.
I don’t want to replace them for characters without accent. I’d like to ignore them and clean others.

Pattern [~#%&*{}/:<>?|"-] will not get sth like this:


With this one [[1] those dots are also considered as thing to replace and letters with accents stays :


Thanks for help. @Adrian_Star nailed it :wink:

  1. \p{L}|\p{N}|\s ↩︎

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.