i have pdf string of invoice now i wnt to capture the address from that what will be best way to get that as address keeps on chnaging for multi country
@manoj_verma1
Hi,
Use read pdf text,if the pdf is scanned pdf use read pdf text with ocr
Use regex or string manipulations
Thanks
If it is a plain text pdf you can use Read PDF activity
You might need to use Regex, depends on the text you are generating to extract the address
Thanks,
Srini
Use the read pdf Text for structured pdf and use read pdf text with OCR for unstructured pdf and use any OCR engine inside the read pdf text with OCR. The output is stored in a string variable.
Let the String variable write in a text file then use the regex expressions to extract the output address.
Use the Match activity to use the regex expressions.
Hope it helps!!
If it is a scanned document use read pdf with ocr otherwise use Read pdf text activity.
Then by using regex you will get the required fields.
Regards,
Hi @manoj_verma1 ,
If your string has static address format, You can regex to get address.
For example your address pincode will be 6 char, use below expression.
System.Text.RegularExpressions.Regex.Match(yourStringhere,"^.*?\b\d{6}\b").Value
or share address format if possible
Thanks!
Give me the proper text and required output to be extract.
It will give us more information
Hi,
Test_SMS_RegalCompany_CountryName_\d{2}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}
I hope this will help you
@Umadevi_Sanjeevi @Srini84 @mkankatala @pravallikapaluri @lrtetala
any website that you recommend for regex creation
@manoj_verma1
Regex 101, RegexR
Regexr is the more preferred one. In Regex 101 it will not accept the Look behind function.
Open the below link to navigate to Regexr
(https://regexr.com/)
Hope it helps!!
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.