Remove duplicate pincode from address feild

Hi Community.

I am extracting data from a PDFs, which contains addres. So in the address field, the pincode is repeating twice

Eg “B4567, Western Avenue , Banglore, Karnataka India 560068560068”

I tried regex replace with “([0-9]{6})”. But that’s removing the entire pincode. But i want to retain one occurance of the pincode.

Any help would be much appreciated. Thank you

Hello @benjamin.9052

So does that mean in the address field itself duplicate value is there or while scrapping the data it is cretaing the duplicate?

please confirm on this.

there will be 2 matches for the same regex and you use matches activity to get it… Then use replace function to replace it with matchoutput(0)

Hi @benjamin.9052 ,

Could you let us know if the PDF is a Digital or a Scanned PDF. In addition could you also tell us what was the Extraction method used ?

Maybe we could solve this from the Source data/method if possible.

An Alternate method is such that we can However use Regex, provided we know some more details for the Regex pattern to be Strict for you case of Extraction, Below pattern does match, but do test it for different cases :

extractedAddr = "B4567, Western Avenue , Banglore, Karnataka India 560068560068"

Considering after extraction you would want to keep only one pincode, meaning there is always one pincode, you could Check the following :


If there is a change in the input data pattern, we would need to modify the regex as well.

Let us know if you have differing patterns.


It is not a scanned pdf. It’s a digital copy. I am using pdf to string activity and using a combination of regex and string manipulation to fetch the data.

I found the solution

Variable = new Regex("([0-9]{6})").Replace(Variable,"",1)

This is replacing only once occurrence of the pincode .

Thank you for your responses
Hi @benjamin.9052,

As an alternative way,

renewedValue.xaml (6.0 KB)

I hope it works for you.


