Regex.Replace string with quotation marks

Hi,
I got stuck at the following regex issue: I am OCR reading a document and using part of the read text as a new file name, thus have to remove all invalid characters, if any.

The String variable from OCR looks like: Var = “NON-DISCLOSURE AGREEMENT\r(“Agreement”)\r\a” and I need to remove the quotation marks surrounding word Agreement, but cannot make it. The escape characters work fine.

I am using regex.replace(Var, “\r|\a|”"", " “), and its variations like [”"] but could not make it work on apostrophes. Any ideas how to tackle this?

Thanx and regards

Libor

@libork Have you tried Var.Replace("""","")

Hi SupermanPunch,
Thanx! I have tried indeed, no luck. If I input the string manually like: “text (”“Agreement”") text ", to get the quotation marks inside the string, I am able to remove it by either regex or replace as you propose. But once it’s the activit generated string, it is not working.

Cheers Libor

@libork

You can try Regex Replace activity

[^\w ]

You can try split and give the string which to remove the " "

It will Match anything which is not alphanumeric or Space as below

Mark as solution if this helps

Thanks

@libork

another technique is to bring the " with Chr(34) into the statements
grafik

Hi ppr,
this is the clue, I have followed the same path and found that the apostrophes generated via OCR are not unicode char 34 but u8220 and u8221, which I have no idea how to generate via keyboard and could not be fetched directly by Chr(8220), so I finished with something crazy like:

NewText = Text.Replace(Convert.ToChar(8220).ToString, “”).Replace(Convert.ToChar(8221).ToString, “”)

which works for my case.

Thnx!

Best

libork

2 Likes

@ksrinu070184

This is smart workaround! I would need to retain the other non-alphanumeric chars though, so would use it as last resort.

Thanx a lot

Cheers

libork

1 Like

@libork

Mark as solution if this helps

Happy Automation

Thanks

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.