Hi,
I am after extracting text from jpg file. Not sure if standard screen scraping methods should help as the process I am automating is repetetive - there should be dozens of images and script should iterate over those images in a loop. Data Scraping mechanism allows to extract text once? or am i wrong?
As for now I have to export jpg to pdf and i really want to avoid that, either way OCR with read pdf works really well.
yah thats right and that is what it is actually intended to
âwell we can convert that read text from image to a pdf by writing it to a word document with word application scope using append text activity
âthen we can use EXPORT TO PDF to convert that as pdf
âto use word application scope download the packge UiPath.word.activities from manage packages and once installed we can use the above mentioned activities to get this done
hope this would help you
Cheers @ykuzin
Hey @ykuzin
The easiest solution is to read all images from its respective directory and and then use Load Image Activity which will return you an Image Variable output then use this returned Image Variable output with, based on best results returned by Available free OCR Engines (Paid also you can use if you wanna opt for it ) to process your image and Perform String manipulation on returned Text by OCR Engines.
but you have to keep in mind that this will also be the slowest solution if image size is bug and text existence also because OCR time is proportional with image size.
RegardsâŚ!!
Aksh
Hi @Palaniyappan @aksh1yadav,
Thanks for all the datails. So i tried to OCR pdf files it returned me error âRead PDF with OCR: Error performing OCR: Unable to initialize Microsoft engine MicrosoftErrorCreateEngineâ . Surfing on forum led me to the topic (Microsoft OCR in Microsoft office 2016 - #3 by rachelfonseca) saying this error can be trouble shoot by locating Optical Character Reconginiton somewhere in settings which I happen to not find running on Windows 7.
Any ideas i can check required version against pre installed one?
Itâs Office 2016, pal .
Hey @ykuzin
Follow below steps to turn on the OCR feature in Microsoft Office.
-
Go to the âAdd or Uninstall Programâ in Microsoft Windows, select âMicrosoft Officeâ from the list.
-
Click on the âChangeâ button
-
Check the âAdd or Remove Featuresâ button
-
Click on âContinueâ
-
In the following dialog box, select âOffice Toolâ > âMicrosoft Office Document Imagingâ > â Scanning, OCR and Indexing Service Filterâ and under the drop down list choose ârun from computerâ.
-
Proceed with the installation by clicking the âContinueâ button
Wait for the installation and when the installation is successful
RegardsâŚ!!
Aksh
Iâve just tested it, and it worked on the first hit. I used Microsoft OCR Engine, an Excel Application scope, and wrote the String result into an Excel cell, simple and easy.
This will help me automate a mundane task at work: I need to extract a serial number from the product packages. I usually take a photo of the package on site, and deal with the rest later on in my office. Now this little sequence completely changes the way how we will be handling the product serial numbers from now on. What an elegant way of extracting text from a .jpg file.
Thatâs absolutely awesome @ykuzin, @aksh1yadav Thank You!
@PeteGrabec Genuinely glad to share your delight!
Hi,
I am having the same issue, i am using Office 365 pro plus, could you please help me how to enable OCR in Office 365 ? for change option i am getting only two options those are quick repair and online repair.
Thanks a lot.
i tried scraping data from a jpg file but it is not retrieving the correct data