Extract address

I wanted to extract the address from a scanned pdf , can someone help me with the linq query …? @ppr @Yoichi please help …

Hello @Nissy_Ruth_Prabhu
Kindly share some sample inputs, Which helps us to provide a better solution for you.

@Nissy_Ruth_Prabhu
You can use regex to extract the specific part from the string.
for example

Address: XXX.
YYYY, ZZZZ-01235
System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Address:)[A-Za-z\W\n0-9]+").tostring.trim.

image
Based on your requirement, regex conditions can develop

1 Like

Thanks a lot @Gokul_Jayakumar , but how can we use string manipulation to extract the address from a .txt file (Extracted data from scanned pdf and saved in .txt file) …

as I use Citrix I don’t have the option to share the exact sample but I’ll try to manually write nad share

Hello @Nissy_Ruth_Prabhu

You can use the use matches activity and use the regex builder in it to frame the regex for the address.

If your data from scanned pdf is already read using Read Pdf with Ocr and if available in txt file. Then you can read that text file and get it to a string variable. That variable you can pass to matches activity alon woth the regex.

Thanks

1 Like

Tqq so so much @Rahul_Unnikrishnan , will try and update u soon

@Nissy_Ruth_Prabhu After the Read using the OCR activity, the string variable pass to the Regex Match activity or Expression you can Extract the particular Text.

Just recreate the Text file by changing the Address, But the format should be the same as the Actual, like before and after the specified test which is used as a Keyword to Extract the required data from the string.