Extract embedded text from image eg. text from a stamp?

Can we extract Embedded text from an image in PDF like text from a logo or a stamp, using document Understanding and Intelligent OCR?

Hi @SWATI_KAROT, I have used document understanding and I think that you can try to do that because in document understanding you need to put coordinates where is what you want to extract from so if a picture is in document then you need to position coordinates to extract text from that specified section.

@dfilipovic, thank you for your response. I have used Document understanding and Intelligent OCR for PDF extraction, using Form extractor. But while training the extractor, and I drag an area around the logo, the text is not getting highlighted in it. So i am not getting the extraction also.
Please see below screenshot.

@SWATI_KAROT let me check it out on my lab. I have couple PDF’s with logo’s.

Thanks. Awaiting your results. Just to add to it, i am trying to see if this can be done out of the box with Document understanding and Intelligent OCR. I know Microsoft OCR can extract the text from images like stamps and logos.
But i want to know it the framework is enough.

I have tried it also and I have the same problem with document understanding, so I have tried the other way around.

First I have extracted image from PDF document with Extract Images from PDF activity:

Then I have used Load Image activity:

After Load Image I have only used Tesseract OCR:



Load Image settings:

Tesseract OCR settings:

Thanks for confirming this @dfilipovic . Happy automation.

