Extracting text within an image (PDF)

pdf
ocr
activities

#1

How do I extract text within an image inside a PDF. An example is shown in the image below. I am trying to extract out the figures and the text. For example, 68m TEUs handed in 2016

. The number 68m is an Image whereas TEUs handled in 2016 is text.

Thanks In Advance.


Microsoft OCR doesn't work
#2

Did you tried Read PDF activity ? if not, please give a try… it will work…
you have to do string manipulations once you read text. I think this is a repeated post.


#3

I used Read PDF and then I add in google OCR to retrieve the text… but the outcome is not what I desired. What do you mean by string manipulation?? Can you provide some example?


#4

If you use Read PDF text activity then out put value will be as follows(i am assuming):
Output: "68m210030000190000 TEUs handled in Cranes Staff Containers Moved 2016 Daily"
Once you get all text in Output string then you have to split based on keyword and get desired value from the parent string.
This is called string manipulation.


#5

I use Read PDF with OCR* to read the text, my apologies. Here’s an image of the outcome.PDF.


#6

@poogy112 Read PDF with OCR is not efficient because always OCR look required frame as image. Some times it may misunderstand and produces different characters than real values. So i recommend to use Read PDF text activity.


#7

ReadPDF works but I need to retrieve out those values from the image. The values of 68m TEUs handled in 2016 are inside an image. Thus, I cannot use Read PDF text activity. Is there any other solution to the problem?


#8

@poogy112 is it possible for you to attach the pdf file here ?


#9

ok now i got what problem you are facing… did u get a chance to convert PDF into document ?
try to see if any images are getting convert into text…


#10

Sorry I can’t convert the PDF document into other document type.test.pdf (664.0 KB)


#11

I am able to extract successfully your PDF data by using FreeOCR engine + UiPath.

Download FreeOCR from here
http://www.paperfile.net/download2.html


#12

Hello poogy

Tried few logic’s and attaching the one that worked best for me.

Note: Please change the file path and run the bot.

project.json (302 Bytes)
Main.xaml (14.2 KB)

and you need to perform string manipulations to obtain the desired values from the string.

here is the screenshot of the output.

Capture


#13

How do you add in the download FreeOCR into UiPath?


#14

@poogy112 You have to download FreeOCR manually and install that application. After that just take a sample PDF and load into FreeOCR application and capture all steps. Convert those steps into your UiPath Work flow and use for your main application.
Hope my inputs are useful.


#15

Sorry let me rephrase my question, How do we use FreeOCR in UiPath. I’ve tried using FreeOCR and it works as expected. Just wondering how to use FreeOCR inside UiPath.


#16

we can’t use inside UiPath as GoogleOCR or MicrosoftOCR. we have to use as a separate application.
It is only possible when FreeOCR supplies API, then we have to build our own package to include FreeOCR as an activity inside UiPath. for now just use it as a separate application for your project purpose.