Need to extract PDF comment

HI Team,

Hope You are doing well, I have a challenge in the PDF file that we need to extract the data from the PDF file. The data was hidden and it is in comment.

How to extract the comment from the PDF. @loginerror


Hi @copy_writes

You can use the Read PDF Text activity to read the pdf file and store in a String datatype variable.

Then use the Regular expressions to extract the required data from the Extracted Pdf text.

Hope it helps!!


-Import fitz: This is the PyMuPDF library.
-Open the PDF: The PDF file is opened using
-Iterate through pages: For each page in the PDF, we check for annotations.
-Check annotation type: Annotations of type 8 are text annotations (comments). We then extract the content of these annotations.
-Store and print comments: The comments are stored in a list and then printed out.

Yes we can do that but the hidden text is not extracted. the text is hidden and it was a comment when mouse is hover on comment UI that the text is visible.
the read pdf file is not work.

Are you talking about package?

Okay @copy_writes

In this case you have to use the Use application\browser activity and indicate the pdf. Inside Use application\browser activity use the Get text activity and indicate the hidden text. If get text not works use the Get Full Text activity it was the Classic activity but it able to extract the hidden text.

Try this approach.

Hope it helps!!

Yes I try this but I try this but it will work some time

Some times means… @copy_writes

its not work every time it was failing like 4\10, out of 10, 4 is success

You are saying comments right, try the below thread,

Hope you understand!!

No Its giving 0 array output…

Its not given anything

Hi @copy_writes ,

Could you provide more details on how you want the extraction to be done and what is the expected output format ?

If possible could you also share a Sample file so that we can try it by ourselves as well.

HI Supermanpunch,

Thanks for your interest on this topic, I need to extract a data which is comment in the PDF. I share the test pdf please look into it.

I tied below steps:
open the pdf and hover and using get full text i extracted.
but i don’t want that approach.

is there any other steps?

Test.pdf (18.5 KB)

Hi @copy_writes ,

Could you maybe check with the below workflow: (18.1 KB)

It uses the itext .net package which is able to retrieve the Annotations present within the PDF document.

Let us know if you are able to get the desired output else explain what is required.

1 Like

Excellent work, I am impressed, It gave solution.

I have a concern, that this post others also understand can i get know or can you please explain the code how it work which you used inside the invoke code. Please.