Unable to Scrape anything using Tesseract OCR

Dear All,

I am unable to use any functionality of the Tesseract OCR method in UiPath (version 2019.6.0 Community Edition). I have tried scraping web pages, notepads, admin consoles etc. to see if it is application specific. But everytime, I received the message “OCR method failed to scrape this UI Element”.

When I try to use the screen scrapper using the Tesseract OCR, I get the below:

When I try to incorporate the Tesseract OCR in a workflow, I get the run time error:

“Source: Tesseract OCR
Message: Error performing OCR: TessErrorLoadEngine
Exception Type: System.Exception
RemoteException wrapping System.Exception: Error performing OCR: TessErrorLoadEngine —> RemoteException wrapping System.ServiceModel.FaultException: TessErrorLoadEngine
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object outs, IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass7_01.<CreateGenericTask>b__0(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task`1 promise, Boolean requiresSynchronization)”

After checking the logs, I found the error message as the below:

“[ERROR] [UiPath.Studio.exe] [1] System.IO.IOException: Cannot locate resource ‘themes/icons.xaml’.
at MS.Internal.AppModel.ResourcePart.GetStreamCore(FileMode mode, FileAccess access)
at System.IO.Packaging.PackagePart.GetStream(FileMode mode, FileAccess access)
at System.IO.Packaging.PackWebResponse.CachedResponse.GetResponseStream()
at System.IO.Packaging.PackWebResponse.GetResponseStream()
at System.IO.Packaging.PackWebResponse.get_ContentType()
at MS.Internal.WpfWebRequestHelper.GetContentType(WebResponse response)
at MS.Internal.WpfWebRequestHelper.GetResponseStream(WebRequest request, ContentType& contentType)
at System.Windows.ResourceDictionary.set_Source(Uri value)
at UiPath.Studio.Plugin.Workflow.Services.ActivityIconFinder.TryGetThemedActivitiesIconDictionary(String assemblyName)
11:14:27.1478 => [ERROR] [UiPath.Studio.exe] [8] $LoadAssembly: Microsoft.Exchange.WebServices.Auth, Version=, Culture=neutral, PublicKeyToken=36bf3926bv369l33: System.IO.FileNotFoundException: Could not load file or assembly ‘Microsoft.IdentityModel, Version=, Culture=neutral, PublicKeyToken=36bf3926bv369l33’ or one of its dependencies. The system cannot find the file specified.
File name: ‘Microsoft.IdentityModel, Version=, Culture=neutral, PublicKeyToken=36bf3926bv369l33’”

I guess something must be off in my system. I need this component to work in order to proceed with my UiPath training. I would highly appreciate it if anyone has any insights. Thanks in advance.

Did you try with Microsoft OCR?

Thank you for your reply!

Yes. But that is an entirely different problem (or so I think?). I have raised that in another post - Unable to Install Microsoft OCR in 2019

Nonetheless, I would like to try both Tesseract and Microsoft OCR. Moreover, the factors that prevent Tesseract OCR from functioning in my system may interfere with Microsoft OCR also.

Dear All,

Another related post - Tesseract OCR not working (standalone + Screen Scraper) - has been extremely helpful in helping to dig deeper. I am posting these details here instead of the other one in case it may be considered closed.

I am having similar issues with Tesseract OCR as in the referred post. However, the fix does not seem to be the same in my case. I would greatly appreciate any insights that you may have.

#UIPath Studio Community 2019.6.0.
#Windows 7 Ultimate

Below are the steps I have carried out as mentioned in this thread:

1) Installed the debug build package as suggested by @florinszilagyi

While checking Vision Host logs, I would find that while running, the Tesseract package fails with the error:

"10:37:38.4147 Info Starting scrape. Image size: 195150.
10:37:38.8668 Info Scrape options: {“ExtractWords”:false,“Timeout”:null,“ComputeSkewAngle”:false,“Profile”:3,“Language”:“eng”,“Scale”:2.0,“FilterRegion”:null,“Engine”:0,“EngineImplClass”:null,“EngineOptions”:"{“AllowedCharacters”:"",“DeniedCharacters”:"",“Invert”:false}"}
10:37:38.9478 Info Input language:eng, translated language:eng
10:37:38.9598 Info Getting tesseract language for C:\USERS\ADMINISTRATOR.NUGET\PACKAGES\UIPATH.VISION\1.6.0\BUILD\tessdata and language eng
10:37:38.9908 Warn Cannot initialize TesseractEngine with provided path C:\USERS\ADMINISTRATOR.NUGET\PACKAGES\UIPATH.VISION\1.6.0\BUILD\tessdata and language eng
10:37:39.0758 Error Error initializing Tesseract Engine.
10:37:39.0998 Fatal UiPath.Vision.OCR.OCRException: TessErrorLoadEngine —> UiPath.Vision.OCR.OCRException: TessErrorLoadEngine

  • at UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()*
  • — End of inner exception stack trace —*"

2) Removed Read-only status of the tessdata folder.

3) I am the admin. So the runs are in admin mode.

4) My build path is “C:\USERS\ADMINISTRATOR.NUGET\PACKAGES\UIPATH.VISION\1.6.0\BUILD”. Hence the special character case probably do not apply in my case.

5) The file “eng.traineddata”(as shown below) is available in the “C:\USERS\ADMINISTRATOR.NUGET\PACKAGES\UIPATH.VISION\1.6.0\BUILD” path.

6) I have tried re-copying the “eng.traineddata” file into the tessdata folder as suggested by @florinszilagyi.

7) While I have installed most available UiPath packages (especially those from UiPath) related to OCR, I could not install the below “Tesseract OCR” package by Google:

Please help.

Hi @Dexpert,

First of all, the Tesseract-OCR from nuget.org is not an activities package, or supported by our platform. It’s meant to be used as a dependency to .NET projects.

It seems that Tesseract will not allow itself to be loaded behind the scenes.
Please make sure you have the prerequisites installed: https://robot.uipath.com/docs/software-requirements
Especially this: https://www.microsoft.com/en-us/download/details.aspx?id=26767

Unfortunately Tesseract is notorious for having all sorts of issues with Windows 7.

Thank you for the quick reply, @florinszilagyi!

This is very good information. I will work on the same and revert. :+1:

Hi All,

This issue has been resolved. I am now able to scrape data using Tesseract OCR.

Please find the below steps that were implemented (not sure which one worked though).

  1. Cleared a large number of cache and temp files in the system. Especially (but not limited to) UiPath.

  2. My Windows updates were years behind. Manually installed all available Windows updates till date. This did not include the package KB2533623 as it was not automatically detected by Windows Updates.

  3. Changed my computer name from “MYNAME-PC” to “DEXPERT”. Perhaps the hyphen had something to do with it?

  4. Tried to install KB2533623 from - https://support.microsoft.com/en-us/help/2533623/microsoft-security-advisory-insecure-library-loading-could-allow-remot
    This was as per - https://robot.uipath.com/v2018.4/docs/software-requirements.
    However, this kept throwing the message “The update is not applicable to your computer”. Even uninstalled KB2758857 as it reportedly superseded KB2533623. But this also did not work. Racked the internet posts but still did not find a solution.

  5. As per https://robot.uipath.com/v2018.4/docs/software-requirements, Windows 7 SP1 also needed Universal C Runtime KB2999226. Confirmed that this was installed. It had been there since 13-May-2019. (i.e. before I faced the issue). Use the command - wmic qfe list | find “2999226” - to check this. Type it into your command prompt and hit enter.

After all this, I just tried to scrape using Tesseract OCR and to my surprise, it worked! Since #4 and #5 did not make a difference in my system configuration, it may have been #1, #2 or #3. And I did not have to install any additional packages in UiPath either.

Thank you all for your valuable insights and guidance! You folks are Awesome! :+1:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.