Read Pdf From A Specific Folder And Get Text from this And put into a excel file

Hi, @Palaniyappan @HareeshMR

want to open every PDF in a particular folder, then perform Get text activity on certain elements to get the actual value from every file.

Thanks

1 Like

Fine
Hope these would help you resolve this
—use a assign activity and mention like this
arr_filepath = Directory.Getfiles(“yourfolderpath”,”*.pdf”)
—where arr_filepath is a variable of type string array
—now use a for each loop and pass the above variable a string input and in the property panel change the type argument as String
—inside the loop use start process activity and pass file path as item
—then use get text activity and get the output with a variable of type string

Hope this would help you
Cheers @omprasad

2 Likes

Thanks @Palaniyappan

For Your quick reply.

When i use get text or get ocr text it selects the whole pdf not a particular text.

Thanks

Hey @Palaniyappan

Can you please help me out of this problem.

Thanks

1 Like

sorry for the delayed response
kindly try with SCREEN SCRAPPING Method in that case
i hope the pdf is not a native pdf so only you are getting selected as a whole page

Cheers @omprasad

1 Like

Hey @Palaniyappan

I got an error like “Get Ocr Text” Faulted But when i scraped the text it shows me the text and the selector is validate.but when i click on the Highlight button it highlighted whole pdf not the specific element.After doing this when i reopen the uiexplorer the selector shows the selector is not validate.

Thanks

is that text having any solid term near to it
if so we can use ANCHOR BASE activity


Cheers @omprasad

Hey @Palaniyappan

Thanks For your reply.

It again highlighted the page not the text

Thanks

Hi @omprasad
I will suggest you Read the whole PDF Using Read PDF Text Instead of GetText As a get text is not that much of reliable where you can get selectors issue. read whole Text and then from the OutPut string use the regex to get Your Required text using Anchor Text.

2 Likes

Hey @jitendra_123
Thanks for your reply.

Can I know the regex expression

Thanks

@omprasad
Can you provide me the sample string so I can provide you regex Expression.

hey @jitendra_123

But I have Multiple pdf file and wants to automate every pdf file inside the folder.

Thanks

@omprasad

Fine !!! ensure me that you want to extract a same data(Fields) for all the pdf files?

Hey @Sriram07

Yes same data.

Thanks

@omprasad
If the format of PDF is same then it will work for all the PDF files. just provide me the input and what type of output you want from that.

then regex will work good!!!

will you give me the statement or a line where your data is present in the pdf and what you want to extract from that line.
i can frame regex expression for your statement.