Using Regex to replace texts in string

Komom · October 3, 2024, 4:44am

I used OCR to extract text from a pdf file but the extracted texts are rather messy:

[REDACTED]
=~ i.
=>» Goldman w 3
BofA SECURITIES “ZZ” Soman Si @)i@imEms HATONG

Is it possible for me to remove those nonword characters and the word “[REDACTED]” to make it tidier using Regex? As I don’t want to have to many activities to replace unwanted words. Thank you very much for your help!

Supriya_Allada · October 3, 2024, 4:49am

Hi @Komom

Try this:
res = System.Text.RegularExpressions.Regex.Replace(extractedText, "[^\w\s]|\[REDACTED\]", "")

sppal.c · October 3, 2024, 4:50am

have a look at this thread.

yedukondaluaregala · October 3, 2024, 6:15am

Hi @Komom ,

To extract [READACTED] from given text you can use below pattern
[a-zA-Z]+\n

in UiPath
system.text.regularexpression.regex.match(input,“[a-zA-Z]+\n”).tostring

it will extract [READACTED] from given text.

Please try it and let me know if you are facing any challenges

Komom · October 3, 2024, 6:26am

thank you very much for your help!

Komom · October 3, 2024, 6:26am

Thank you very much for your help!

Gowtham_K115 · October 3, 2024, 6:27am

Hi @Komom

try this

System.Text.RegularExpressions.Regex.Replace("inputstring","\[.*\]","").ToString

Regards,
Gowtham K

system · October 6, 2024, 6:27am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to remove set of words with a keyword Studio studio , question , designer_canvas	7	831	February 5, 2023
Remove text from pdf extraction Automation Starter activities , studio	12	1578	July 19, 2022
How to remove the text in string manipulation StudioX studiox , question	6	1371	January 30, 2021
Extracting a portion of a text in a pdf file Studio studio , question , activities_panel	5	848	March 6, 2023
PDF Search Automation Robot robot , question	30	980	April 29, 2023

Using Regex to replace texts in string

Related topics