Extracting text within an image (PDF)

poogy112 · November 8, 2017, 6:12am

How do I extract text within an image inside a PDF. An example is shown in the image below. I am trying to extract out the figures and the text. For example, 68m TEUs handed in 2016

. The number 68m is an Image whereas TEUs handled in 2016 is text.

Thanks In Advance.

rkelchuri · November 8, 2017, 6:22am

Did you tried Read PDF activity ? if not, please give a try… it will work…
you have to do string manipulations once you read text. I think this is a repeated post.

poogy112 · November 8, 2017, 6:23am

I used Read PDF and then I add in google OCR to retrieve the text… but the outcome is not what I desired. What do you mean by string manipulation?? Can you provide some example?

rkelchuri · November 8, 2017, 6:28am

If you use Read PDF text activity then out put value will be as follows(i am assuming):
Output: “68m210030000190000 TEUs handled in Cranes Staff Containers Moved 2016 Daily”
Once you get all text in Output string then you have to split based on keyword and get desired value from the parent string.
This is called string manipulation.

poogy112 · November 8, 2017, 6:34am

I use Read PDF with OCR* to read the text, my apologies. Here’s an image of the outcome. PDF .

rkelchuri · November 8, 2017, 6:37am

@poogy112 Read PDF with OCR is not efficient because always OCR look required frame as image. Some times it may misunderstand and produces different characters than real values. So i recommend to use Read PDF text activity.

poogy112 · November 8, 2017, 6:48am

ReadPDF works but I need to retrieve out those values from the image. The values of 68m TEUs handled in 2016 are inside an image. Thus, I cannot use Read PDF text activity. Is there any other solution to the problem?

Mohammed_ali · November 8, 2017, 6:52am

@poogy112 is it possible for you to attach the pdf file here ?

rkelchuri · November 8, 2017, 6:53am

ok now i got what problem you are facing… did u get a chance to convert PDF into document ?
try to see if any images are getting convert into text…

poogy112 · November 8, 2017, 6:57am

Sorry I can’t convert the PDF document into other document type.test.pdf (664.0 KB)

rkelchuri · November 8, 2017, 10:22am

I am able to extract successfully your PDF data by using FreeOCR engine + UiPath.

Download FreeOCR from here
http://www.paperfile.net/download2.html

Mohammed_ali · November 8, 2017, 10:42am

Hello poogy

Tried few logic’s and attaching the one that worked best for me.

Note: Please change the file path and run the bot.

project.json (302 Bytes)
Main.xaml (14.2 KB)

and you need to perform string manipulations to obtain the desired values from the string.

here is the screenshot of the output.

Capture

poogy112 · November 9, 2017, 1:14am

How do you add in the download FreeOCR into UiPath?

rkelchuri · November 9, 2017, 6:03am

@poogy112 You have to download FreeOCR manually and install that application. After that just take a sample PDF and load into FreeOCR application and capture all steps. Convert those steps into your UiPath Work flow and use for your main application.
Hope my inputs are useful.

poogy112 · November 9, 2017, 6:54am

Sorry let me rephrase my question, How do we use FreeOCR in UiPath. I’ve tried using FreeOCR and it works as expected. Just wondering how to use FreeOCR inside UiPath.

rkelchuri · November 9, 2017, 6:55am

we can’t use inside UiPath as GoogleOCR or MicrosoftOCR. we have to use as a separate application.
It is only possible when FreeOCR supplies API, then we have to build our own package to include FreeOCR as an activity inside UiPath. for now just use it as a separate application for your project purpose.

Topic		Replies	Views
Extract data from PDF using OCR or Text read activity Help pdf , ocr , activities , question	6	9110	December 6, 2019
Extract Text as image from PDF Studio studio , question , activities_panel	5	1057	September 21, 2022
Not able to extract data from pdf Activities ocr , studio	5	989	October 19, 2022
Extracting the data from image based pdf Help pdf , ocr , activities	4	965	March 20, 2020
How to get a particular text in a PDF file using "Read PDF With OCR" Activity? Activities pdf , ocr , activities , question	8	1842	June 19, 2023

Extracting text within an image (PDF)

Related topics