Using Regex to replace texts in string

I used OCR to extract text from a pdf file but the extracted texts are rather messy:

[REDACTED]
=~ i.
=>» Goldman w 3
BofA SECURITIES “ZZ” Soman Si @)i@imEms HATONG

Is it possible for me to remove those nonword characters and the word “[REDACTED]” to make it tidier using Regex? As I don’t want to have to many activities to replace unwanted words. Thank you very much for your help!

Hi @Komom

Try this:
res = System.Text.RegularExpressions.Regex.Replace(extractedText, "[^\w\s]|\[REDACTED\]", "")

1 Like

have a look at this thread.

1 Like

Hi @Komom ,

To extract [READACTED] from given text you can use below pattern
[a-zA-Z]+\n

in UiPath
system.text.regularexpression.regex.match(input,“[a-zA-Z]+\n”).tostring

it will extract [READACTED] from given text.

Please try it and let me know if you are facing any challenges

1 Like

thank you very much for your help!

Thank you very much for your help!

Hi @Komom

try this

System.Text.RegularExpressions.Regex.Replace("inputstring","\[.*\]","").ToString

image

Regards,
Gowtham K

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.