Extract embedded text from image eg. text from a stamp?

Can we extract Embedded text from an image in PDF like text from a logo or a stamp, using document Understanding and Intelligent OCR?

Hi @SWATI_KAROT, I have used document understanding and I think that you can try to do that because in document understanding you need to put coordinates where is what you want to extract from so if a picture is in document then you need to position coordinates to extract text from that specified section.

There is a lot of topics about that:

There is a specific section in this forum about that:

@dfilipovic, thank you for your response. I have used Document understanding and Intelligent OCR for PDF extraction, using Form extractor. But while training the extractor, and I drag an area around the logo, the text is not getting highlighted in it. So i am not getting the extraction also.
Please see below screenshot.

@SWATI_KAROT let me check it out on my lab. I have couple PDF’s with logo’s.
Cheers,
Dino

1 Like

Thanks. Awaiting your results. Just to add to it, i am trying to see if this can be done out of the box with Document understanding and Intelligent OCR. I know Microsoft OCR can extract the text from images like stamps and logos.
But i want to know it the framework is enough.

Hi @SWATI_KAROT,
I have tried it also and I have the same problem with document understanding, so I have tried the other way around.

First I have extracted image from PDF document with Extract Images from PDF activity:

Then I have used Load Image activity:

After Load Image I have only used Tesseract OCR:

Dependency:

Workflow:

Load Image settings:

Tesseract OCR settings:

1 Like

Thanks for confirming this @dfilipovic . Happy automation.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.