Extract Specific Text from PDF

Selvasathya · October 10, 2019, 1:22pm

Hi Team,

I have a pdf file and wanted to extract a specific data alone from the pdf file.
If I manually try to open the pdf → copy and paste it in a notepad. I am getting the content(address) each line after line.
If I try to use the Read PDF activity and then use Write Text file activity. The content contains the full text like the table values beside it and not able to get the address alone separately.

Kindly help me on how to extract a specific content alone from PDF.

Note: No Keyword to find the Start and End point for the content.

allurai_india · October 10, 2019, 1:26pm

Hi

when you do the Read pd activity .

do you have text extraction like this ?
Doorno:,location,city**

or like below

Doorno:,
locaiton,
city**

apurba2samanta · October 10, 2019, 1:30pm

Hi @Selvasathya,

Can you please share your PDF file or share the screen shot of that portion?

Thanks & Regards,
Apurba

Selvasathya · October 10, 2019, 2:00pm

Selvasathya · October 11, 2019, 8:10am

I have attached the sample pdf and output results, kindly help me

amaresan · October 11, 2019, 10:42am

@Selvasathya

So, here which value is address?, which value you have to extract from this?

Selvasathya · October 11, 2019, 10:45am

Hi,
Below is the address which needs to be extracted.

Mrs XYA XYZ
909 XYZ XYZ
XYZ
ABC ABC

apurba2samanta · October 11, 2019, 11:17am

Hi @Selvasathya,

Please read the PDF first. If the PDF is of normal format then please use Read PDF Text activity and if the PDF is of scanned PDF format please use Read PDF With OCR.

Then use Regex to get your desired output.

Thanks & Regards,
Apurba

Selvasathya · October 14, 2019, 8:49am

Hi Apurba,

Thank you for the info. I would like to get one more help I am trying to work with Regex from Friday. But unable to find the solution. I want to Regex the below content. Please help with this also (Note: Address details are present in each line after line)

Details:
Mrs Sathya
10 Greenway Road
Broadway
Chennai
600 001
VAT

apurba2samanta · October 14, 2019, 9:20am

Hi @Selvasathya,

If you kindly share the file you are working with, it will be helpful to understand better for me.

Thanks & Regards,
Apurba

Selvasathya · October 14, 2019, 9:50am

Hi Apurba,

It is client protected data, which I will not be able to share. So only I have mentioned a sample of how it will look like. And it will be in notepad only

Details:
Mrs Sathya
10 Greenway Road
Broadway
Chennai
600 001
VAT

apurba2samanta · October 14, 2019, 10:04am

Hi @Selvasathya,

I think, you are saying that your required output text is there for multiple times.

Do you want to extract this for a single time? And is the last text is VAT for every time?

If all the above questions are true then please try with the below workflow -
Main.xaml (7.8 KB)

Thanks & Regards,
Apurba

omkartodkar1999 · March 3, 2021, 12:13pm

what is regex?

prasath17 · March 3, 2021, 1:00pm

@omkartodkar1999 …please search in the forum or use Google.

Topic		Replies	Views
Extract specific data from lined PDF Help pdf , activities , regex , question	5	962	January 22, 2020
PDF particular data Activities pdf , activities	7	398	May 8, 2023
How to extract specific text from PDF Certification studio	10	4270	July 13, 2020
Extract a specific info from text Studio studio , question , activities_panel	4	492	March 27, 2023
Extracting text from PDF using starts with Help	6	843	January 20, 2020

Most Active Users - Yesterday
mkankatala
ashokkarale
Yoichi
sven.wullum1
chandreshsinh.jadeja
sharazkm32
sonaliaggarwal47
Ankit_Kumar2
SorenB
fanmixco
More details...

Extract Specific Text from PDF

Related topics