Hi,
I am having a problem while I extract the data from the scanned document. I have two issues.
1.
I need to extract postcode. But there is coma (,) next to postcode. When I select the postcode area, it is also taking the (,). But I need to remove it. I tried to remove it by using Present validation station. It didn’t work. How can I resolve this issue?
2nd issue is regarding email address extraction. In stead of @, it is taking “P”. But i need to replace it with @. How can I do that?
You can use the String method - Replace - to make the changes. The first argument is the text you want to remove and the second argument is what you want to replace it with.
Hi @aman.sharma1 ,
Thank you. I have shared some screenshots so that you can understand how I am extracting the data. Where can I use remove or replace function when the data is in dataset?
One way is – after the Export Extraction Results activity, use the For Each Row activity on dataset.Tables(0). Inside the For each row, you can place this Replace logic, ensuring that you work with the correct column data.
So, for example, if the zip code is in the 3rd column of dataset.Tables(0), then inside the For each row, you will have row(2).Replace(“,”,“”).
But it’s possible I am not understanding the structure of your dataset variable. In which case you will need to display the contents of dataset to figure out how it’s storing the zip code and email.
We would require to know if the email value will always be lower case, and the P in place of @ after extraction is always Capital. If this is the case always, then we can perform String Replace methods.
Else, we would want to know some more details about your extraction, what was the OCR used? what was the Extractor used?
hi everyone, @ushu the solution you have provided, it worked. @supermanPunch, I have used omnipage OCR engine and extractor is form extractor.
I would like to mention one point regarding email address. Email address is dynamic. It could be changed. So if email address has more “P”, then it would also be replaced by “@” which is wrong.
So my concept is that to replace only 8th no character(“p”) from the last into “@” . (martinPdef.com). Because “.com” is common for all email and “def” is company name. That means @ is located before the company name. Is it possible to replace based on the letter position?
@emshihab If you want to replace P which comes before companyname.com then try with below exp. If there is P then it replace with @ else it won’t do anything