Can we extract Embedded text from an image in PDF like text from a logo or a stamp, using document Understanding and Intelligent OCR?
Hi @SWATI_KAROT, I have used document understanding and I think that you can try to do that because in document understanding you need to put coordinates where is what you want to extract from so if a picture is in document then you need to position coordinates to extract text from that specified section.
There is a lot of topics about that:
There is a specific section in this forum about that:
@dfilipovic, thank you for your response. I have used Document understanding and Intelligent OCR for PDF extraction, using Form extractor. But while training the extractor, and I drag an area around the logo, the text is not getting highlighted in it. So i am not getting the extraction also.
Please see below screenshot.
@SWATI_KAROT let me check it out on my lab. I have couple PDF’s with logo’s.
Cheers,
Dino
Thanks. Awaiting your results. Just to add to it, i am trying to see if this can be done out of the box with Document understanding and Intelligent OCR. I know Microsoft OCR can extract the text from images like stamps and logos.
But i want to know it the framework is enough.
Hi @SWATI_KAROT,
I have tried it also and I have the same problem with document understanding, so I have tried the other way around.
First I have extracted image from PDF document with Extract Images from PDF activity:
Then I have used Load Image activity:
After Load Image I have only used Tesseract OCR:
Dependency:
Workflow:
Load Image settings:
Tesseract OCR settings:
Thanks for confirming this @dfilipovic . Happy automation.
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.