Removing Watermarks in PDF files - OCR

Hi Team,

How can we remove watermarks in PDF documents. Is there any custom activity to achieve this.

Problem here was I’m trying to get the PDF text with Azure OCR. Because of watermarks, I’m unable to get the text under them.

Kindly suggest solutions.image

Hi @samanthapuri_surya,
It’s Readable PDF or Scanned PDF ?

Hi @Vivek.A.S

It contains both images as well as free text. We are working to extract text in both images and free text.

@samanthapuri_surya ,
Step 1:
Use Read PDF Activity --> whole text stored into Variable --> using Matches Activity -->Get the Watermarks text from variable with help of replace function.
Step 2:
and then Use Read PDF with OCR Activity .

@ Vivek.A.S

I am not looking to remove watermarks after extracting. The extraction itself is an issue due to overlay of watermarks over the text.

Hello, maybe you can use a third-party watermark-removing tool to remove watermarks in your PDF files. I think you can try Easepaint Watermark Expert, this tool is easy to use and cost-effective. Hope it can help you.

1 Like

BTW , how do we recognize a watermark in a pdf file ?

How do we get the the Watermarks text ?