Extract Specific Text from PDF

Hi Team,

I have a pdf file and wanted to extract a specific data alone from the pdf file.
If I manually try to open the pdf β†’ copy and paste it in a notepad. I am getting the content(address) each line after line.
If I try to use the Read PDF activity and then use Write Text file activity. The content contains the full text like the table values beside it and not able to get the address alone separately.

Kindly help me on how to extract a specific content alone from PDF.

Note: No Keyword to find the Start and End point for the content.

Hi

when you do the Read pd activity .

do you have text extraction like this ?
Doorno:,location,city**

or like below

Doorno:,
locaiton
,
city
**

Hi @Selvasathya,

Can you please share your PDF file or share the screen shot of that portion?

Thanks & Regards,
Apurba

Output

I have attached the sample pdf and output results, kindly help me

@Selvasathya

So, here which value is address?, which value you have to extract from this?

Hi,
Below is the address which needs to be extracted.

Mrs XYA XYZ
909 XYZ XYZ
XYZ
ABC ABC

Hi @Selvasathya,

Please read the PDF first. If the PDF is of normal format then please use Read PDF Text activity and if the PDF is of scanned PDF format please use Read PDF With OCR.

Then use Regex to get your desired output.

Thanks & Regards,
Apurba

Hi Apurba,

Thank you for the info. I would like to get one more help I am trying to work with Regex from Friday. But unable to find the solution. I want to Regex the below content. Please help with this also (Note: Address details are present in each line after line)

Details:
Mrs Sathya
10 Greenway Road
Broadway
Chennai
600 001
VAT

Hi @Selvasathya,

If you kindly share the file you are working with, it will be helpful to understand better for me.

Thanks & Regards,
Apurba

Hi Apurba,

It is client protected data, which I will not be able to share. So only I have mentioned a sample of how it will look like. And it will be in notepad only

Details:
Mrs Sathya
10 Greenway Road
Broadway
Chennai
600 001
VAT

Hi @Selvasathya,

I think, you are saying that your required output text is there for multiple times.

Do you want to extract this for a single time? And is the last text is VAT for every time?

If all the above questions are true then please try with the below workflow -
Main.xaml (7.8 KB)

Thanks & Regards,
Apurba

1 Like

what is regex?

@omkartodkar1999 …please search in the forum or use Google.