Get Text from PDF

Hi all,

I’m learning about the PDF automation.

But I have problem with my case. I want to GetText from pdf file as attach but could not be selector exactly and seem error .

Could you tell me why ? and how to do that ?
Sample PDF.pdf (339.6 KB)

Thanks you!


Hi I’m not getting your question. Can you please explain me in detail what you have done till now? It will help to understand better

Hi Pathrudu.

Sorry if my question not clearly to you.

I want to scrap the price on the pdf file as my previous post. But seem it has problem so I could not scrapped.

I could not used Selector to chose what I want to scrap on the pdf file… I have press F3 key and chose region but still not had exactly result.

thanks you!

Check your adobe settings. There will be a setting to ActivateReadOutLoud, please find the attached image for your ref.

PDFSettings

Hope this will help you.

Regards,
Pathrudu

might be this post work for you.

rgds,
J

Hi Jumbo.

Thanks you so much.

I have read this thread. I have tested with two PDF file, one fine is okay but one file still can not ( selector still chose full screen )

Do you know why ?


Hi pathrudu.

Thanks you so much. I have fixed about 50% with thread Selecting PDF Elements

Is the second one native pdf or OCR?

@pathrudu Second file is saved from excel file.

With OCR file… how to do ?

I just gone through the pdf file you attached it’s scanned PDF.
please find the attached workflow i have extracted two values. open your pdf and run the pdf.xaml you could able to get two values i have extracted.
pdf.xaml (39.8 KB)

Hope this may give better understanding.

Regards,
Pathrudu

@pathrudu
Thanks you!

but how to scrap individual from scanned PDF file ? We always must collect data from scanned PDF file :expressionless:

You can use relative scrapping by identifying anchor element for each field, in other way you can use red pdf with OCR and do the string manipulation to get required data.
regards,
Pathrudu

or sometime open pdf with Word might be helpful to convert pdf table to excel table.
open with pdf>> copy and paste to Excel

Rds,
J,

@pathrudu
I have tried but still could not scrapped data I want.

Could you pls guide step by step for this ?

Thanks you!

@pathrudu
Pls edit my program as attach. I have attach more pdf file.

Thanks you!GetText from PDF.xaml (16.7 KB)
Sample PDF.pdf (339.6 KB)

Don’t Use selectors. in simpler way i can say open citrix recorder and choose scrap relative. once go through relative scrapping topics.

so you may get some idea.

I’ll edit your code in some time.

regards,
Pathrudu

1 Like

@pathrudu

Thanks you so much. I have used Get OCR Text activity and successfully ^^

1 Like

@pathrudu
Hi, sorry I have one more question.

I have used Find Element activity, but it has problem and how to re-correct.

Thanks you!GetText from PDF.xaml (19.3 KB)

use find image instead of find element. there won’t be any elements in scanned PDF.

@pathrudu
Hi, I have tried to use Find Image and the program show error as below. Pls guide me how to do in this case.
1.pdf (24.2 KB)