I need capture to address from reading a pdf

Hello friends,
I need to capture specific to address part (highlighted part (bold)), sample text as below:
BASIC REPORT
ABCDE CIRCLE OFFICE BUILDING
116 W Market Street
Aberdeen, Washington 98520

PREPARED FOR:
Always Happy
BANK OF THE ASIA
4144 Hannegan Road
Bellingham, Washington 98226

PREPARED BY:
Rama Rao. Cathy, MAO, AO-GPRS
CASSINELLI VALUATION LLC
4804 NW Bethany Boulevard, Suite I-2 Portland, Oregon 97229
(503) 575-9217 / testx@cmail.com

Please help me in building the logic for the address part:

  1. Address will start with numeric (street number) and ends with five digit zip code.

Thanks

Hi @cschevuri

How about this expression?

System.Text.RegularExpression.Regex.Match(InputString,"(?s)^\d+(.*?)\d{5}").ToString.Trim

Regards
Gokul

Another Regex Patter @cschevuri

System.Text.RegularExpression.Regex.Match(InputString,"(?s)^\d+.*?\d{5}").ToString.Trim

Regards
Gokul

Thank you , will explore this.

Let us know if you have face any challenges @cschevuri

Main.xaml (7.0 KB)
project.json (1.6 KB)
Hi Gokul, please verify the above code, iam getting an error message.
Thanks

Hi @cschevuri

Try with this XAML file

Main (8).xaml (6.5 KB)

image

Regards
Gokul

Hey!

Try this…

  1. Read the pdf - Output as - strPDFOut
  2. Take one assign activity and create one string variable named strAddress

Pass the expression like this

LHS: strAddress
RHS: System.Text.RegularExpressions.Regex.Match(strPDFOut,"\d.*\n.*\d{5}$").ToString

Reference:

https://regex101.com/r/rJxTZn/1

Regards,
NaNi

Hi Gokul, Sorry.
i am getting the output same as it is captured from Input pdf sheet, as below:
BASIC REPORT
ABCDE CIRCLE OFFICE BUILDING
116 W Market Street
Aberdeen, Washington 98520

PREPARED FOR:
Always Happy
BANK OF THE ASIA
4144 Hannegan Road
Bellingham, Washington 98226

PREPARED BY:
Rama Rao. Cathy, MAO, AO-GPRS
CASSINELLI VALUATION LLC
4804 NW Bethany Boulevard, Suite I-2 Portland, Oregon 97229
(503) 575-9217 / testx@cmail.com

Expected output is
116 W Market Street
Aberdeen, Washington 98520

4144 Hannegan Road
Bellingham, Washington 98226

4804 NW Bethany Boulevard, Suite I-2 Portland, Oregon 97229

Hi Nani,
I tried your code, but it is returning the same input values captured from pdf.
Thanks
Chandra

Hey!

Try this…

System.Text.RegularExpressions.Regex.Match(StrInput,"\d.*\n.*\d{5}$|\d.*\d{5}$").ToString

Reference:

Regards,
NaNi

HI @cschevuri

Can you share pdf file here

Regards
Gokul

Hi @cschevuri

Regards
Gokul

Hi Gokul, please find pdf as attached.
PDFsearch.pdf (40.3 KB)

Hi Nani,
I tried the above code, iam getting output as below:
image
Thanks.
Chandra

Hi @cschevuri

How about this expression?

System.Text.RegularExpressions.Regex.Match(InputString,"(?s)\s+\d+.*?\d{5}").ToString.Trim

Regards
Gokul

2 Likes

Hey!

Have you used writeline or Log message to display the output?

Try with Message box

Regards,
NaNi

Hi Gokul, This worked well, Thanks a lot.
Regards,
Chandra

Great @cschevuri

Happy Automation

Hi Nani
I tried write line command, but returning empty values.
The below code worked well:
System.Text.RegularExpressions.Regex.Match(PDFtext,“(?s)\s+\d+.*?\d{5}”).ToString.Trim

Thanks for your help.