Extract embedded text from image eg. text from a stamp?

SWATI_KAROT · August 21, 2020, 6:09am

Can we extract Embedded text from an image in PDF like text from a logo or a stamp, using document Understanding and Intelligent OCR?

dfilipovic · August 21, 2020, 6:22am

Hi @SWATI_KAROT, I have used document understanding and I think that you can try to do that because in document understanding you need to put coordinates where is what you want to extract from so if a picture is in document then you need to position coordinates to extract text from that specified section.

There is a lot of topics about that:

There is a specific section in this forum about that:

SWATI_KAROT · August 21, 2020, 6:58am

@dfilipovic, thank you for your response. I have used Document understanding and Intelligent OCR for PDF extraction, using Form extractor. But while training the extractor, and I drag an area around the logo, the text is not getting highlighted in it. So i am not getting the extraction also.
Please see below screenshot.

dfilipovic · August 21, 2020, 7:18am

@SWATI_KAROT let me check it out on my lab. I have couple PDF’s with logo’s.
Cheers,
Dino

SWATI_KAROT · August 21, 2020, 7:27am

Thanks. Awaiting your results. Just to add to it, i am trying to see if this can be done out of the box with Document understanding and Intelligent OCR. I know Microsoft OCR can extract the text from images like stamps and logos.
But i want to know it the framework is enough.

dfilipovic · August 21, 2020, 7:34am

Hi @SWATI_KAROT,
I have tried it also and I have the same problem with document understanding, so I have tried the other way around.

First I have extracted image from PDF document with Extract Images from PDF activity:

Then I have used Load Image activity:

After Load Image I have only used Tesseract OCR:

Dependency:

Workflow:

Load Image settings:

Tesseract OCR settings:

SWATI_KAROT · August 21, 2020, 7:48am

Thanks for confirming this @dfilipovic . Happy automation.

system · August 24, 2020, 7:48am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Exact data from image in a pdf Off-Topic Discussions	2	870	July 26, 2019
Extract specific logo with company name from pdf into Excel cell Activities datatable , uiautomation , activities , studio , question	2	41	October 1, 2024
How to extract data from pdf with singature Studio studio , question , activities_panel	3	364	November 23, 2023
Extracting image Studio uiautomation	7	1284	December 6, 2021
Document template automation StudioX	3	1375	May 3, 2021

Extract embedded text from image eg. text from a stamp?

Related topics