Need to extract PDF comment

copy_writes · July 10, 2024, 5:21am

HI Team,

Hope You are doing well, I have a challenge in the PDF file that we need to extract the data from the PDF file. The data was hidden and it is in comment.

How to extract the comment from the PDF. @loginerror

Thanks,
Cheers.

mkankatala · July 10, 2024, 5:26am

Hi @copy_writes

You can use the Read PDF Text activity to read the pdf file and store in a String datatype variable.

Then use the Regular expressions to extract the required data from the Extracted Pdf text.

Hope it helps!!

bavyaravu133 · July 10, 2024, 5:35am

@copy_writes

-Import fitz: This is the PyMuPDF library.
-Open the PDF: The PDF file is opened using fitz.open(pdf_path).
-Iterate through pages: For each page in the PDF, we check for annotations.
-Check annotation type: Annotations of type 8 are text annotations (comments). We then extract the content of these annotations.
-Store and print comments: The comments are stored in a list and then printed out.

copy_writes · July 10, 2024, 5:40am

Yes we can do that but the hidden text is not extracted. the text is hidden and it was a comment when mouse is hover on comment UI that the text is visible.
the read pdf file is not work.

copy_writes · July 10, 2024, 5:42am

Are you talking about package?

mkankatala · July 10, 2024, 5:43am

Okay @copy_writes

In this case you have to use the Use application\browser activity and indicate the pdf. Inside Use application\browser activity use the Get text activity and indicate the hidden text. If get text not works use the Get Full Text activity it was the Classic activity but it able to extract the hidden text.

Try this approach.

Hope it helps!!

copy_writes · July 10, 2024, 5:44am

Yes I try this but I try this but it will work some time

mkankatala · July 10, 2024, 5:45am

Some times means… @copy_writes

copy_writes · July 10, 2024, 5:47am

its not work every time it was failing like 4\10, out of 10, 4 is success

mkankatala · July 10, 2024, 5:48am

You are saying comments right, try the below thread,

Hope you understand!!

copy_writes · July 10, 2024, 8:01am

No Its giving 0 array output…

Its not given anything

supermanPunch · July 10, 2024, 4:31pm

Hi @copy_writes ,

Could you provide more details on how you want the extraction to be done and what is the expected output format ?

If possible could you also share a Sample file so that we can try it by ourselves as well.

copy_writes · July 12, 2024, 7:37am

HI Supermanpunch,

Thanks for your interest on this topic, I need to extract a data which is comment in the PDF. I share the test pdf please look into it.

I tied below steps:
open the pdf and hover and using get full text i extracted.
but i don’t want that approach.

is there any other steps?

Test.pdf (18.5 KB)

supermanPunch · July 12, 2024, 12:57pm

Hi @copy_writes ,

Could you maybe check with the below workflow:

PDF_GetComments.zip (18.1 KB)

It uses the itext .net package which is able to retrieve the Annotations present within the PDF document.

Let us know if you are able to get the desired output else explain what is required.

copy_writes · July 15, 2024, 9:35am

Excellent work, I am impressed, It gave solution.

I have a concern, that this post others also understand can i get know or can you please explain the code how it work which you used inside the invoke code. Please.

Thanks,
Copyrights.

system · July 18, 2024, 9:36am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to extract the proper data from PDF Studio studio , question , activities_panel	4	128	May 21, 2024
How to extract comments from PDF Robot robot , question	4	545	December 29, 2022
Extract Comments from PDF file Help pdf	4	2655	July 10, 2024
PDF Comment extraction Help activities	1	1429	March 23, 2019
How to extract online PDF data? Studio activities , studio , question , activities_panel	1	528	November 29, 2022

Most Active Users - Yesterday
ashokkarale
mkankatala
hrkppp
takehiro.ichikura
RPANovice1
adi.mehare
manasrlenka25
Frederik_Krogh
Ankit_Chauhan
ruchirmahajan
More details...

Need to extract PDF comment

Related topics