I am after extracting text from jpg file. Not sure if standard screen scraping methods should help as the process I am automating is repetetive - there should be dozens of images and script should iterate over those images in a loop. Data Scraping mechanism allows to extract text once? or am i wrong?
As for now I have to export jpg to pdf and i really want to avoid that, either way OCR with read pdf works really well.
yah thats right and that is what it is actually intended to
–well we can convert that read text from image to a pdf by writing it to a word document with word application scope using append text activity
–then we can use EXPORT TO PDF to convert that as pdf
–to use word application scope download the packge uipath.word.activities from manage packages and once installed we can use the above mentioned activities to get this done
hope this would help you
The easiest solution is to read all images from its respective directory and and then use Load Image Activity which will return you an Image Variable output then use this returned Image Variable output with, based on best results returned by Available free OCR Engines (Paid also you can use if you wanna opt for it ) to process your image and Perform String manipulation on returned Text by OCR Engines.
but you have to keep in mind that this will also be the slowest solution if image size is bug and text existence also because OCR time is proportional with image size.
Hi @Palaniyappan @aksh1yadav,
Thanks for all the datails. So i tried to OCR pdf files it returned me error “Read PDF with OCR: Error performing OCR: Unable to initialize Microsoft engine MicrosoftErrorCreateEngine” . Surfing on forum led me to the topic (Microsoft OCR in Microsoft office 2016 - #3 by rachelfonseca) saying this error can be trouble shoot by locating Optical Character Reconginiton somewhere in settings which I happen to not find running on Windows 7.
Any ideas i can check required version against pre installed one?
Which Office Version You are using? May i know?
It’s Office 2016, pal .
Follow below steps to turn on the OCR feature in Microsoft Office.
Go to the ‘Add or Uninstall Program’ in Microsoft Windows, select ‘Microsoft Office’ from the list.
Click on the ‘Change’ button
Check the ‘Add or Remove Features’ button
Click on ‘Continue’
In the following dialog box, select ‘Office Tool’ > ‘Microsoft Office Document Imaging’ > ‘ Scanning, OCR and Indexing Service Filter’ and under the drop down list choose ‘run from computer’.
Proceed with the installation by clicking the ‘Continue’ button
Wait for the installation and when the installation is successful
I’ve just tested it, and it worked on the first hit. I used Microsoft OCR Engine, an Excel Application scope, and wrote the String result into an Excel cell, simple and easy.
This will help me automate a mundane task at work: I need to extract a serial number from the product packages. I usually take a photo of the package on site, and deal with the rest later on in my office. Now this little sequence completely changes the way how we will be handling the product serial numbers from now on. What an elegant way of extracting text from a .jpg file.
That’s absolutely awesome @ykuzin, @aksh1yadav Thank You!
@PeteGrabec Genuinely glad to share your delight!
I am having the same issue, i am using Office 365 pro plus, could you please help me how to enable OCR in Office 365 ? for change option i am getting only two options those are quick repair and online repair.
Thanks a lot.
i tried scraping data from a jpg file but it is not retrieving the correct data