PDF data extraction using get text is not working

Trying to get the invoice number from the below document

using Get Text selected the invoice number. Getting output as AVPageView.
When hightlighted selecting the larger area. Someone please help me

@RPADemo_Mail

Install pdf package in studio and use read pdf text activity to get text from pdf


Hi @RPADemo_Mail

You can try with Regex expression

System.Text.RegularExpressions.Regex.Match(Sample, "(?<=@)(\S+\s+)(\d{4})|(?<=Invoice\sNumber\s)(\s+)(\S+)").Value.Trim

System.Text.RegularExpressions.Regex.Match(StrInput, "\d+|\S+\d+").Value

Check out this XAML file

ExtractDatafromPDFRegex.xaml (14.4 KB)

Input

image

Output

image

Regards
Gokul

PDF Package is already installed. Data was getting extracted correctly before. Now its not getting the data from PDF

Check out this XAML file? @RPADemo_Mail

And Change the path in the For each activity

Is there any other way to extract specific data using Get Text correctly other then RegEx?

HI @RPADemo_Mail

You can try with Get OCR Text activity. In your case we can able to get all the element using Regex expression.

Can you tell us why you don’t need regex expression?

Regards
Gokul

Is Get Text will not return specific data from PDF? When I worked before it was returning the correct value…Just wanted to know the reason behind not able to extract the data from PDF using Get Text…

Hi @RPADemo_Mail

May be the element doesn’t appeared or selector issue there are many cases like this. In this case you can try with Regex will get the exact value from the PDF.

Regards
Gokul