Tesseract OCR not working (standalone + Screen Scraper)

qdeliere · March 26, 2019, 8:12am

#UIPath Studio Community 2019.3.0

Hi guys,
I’ve a lot of issues using the Tesseract OCR engine, the Microsoft is working perfectly but not the Google One.

Even using the Screen Scraper Wizard it’s not working see screenshot.

See the error stack below (from log file *_VisionHost):

08:57:26.5493 Info Input language:eng, translated language:eng
08:57:26.6118 Warn Error initializing Tesseract engine System.ArgumentException: Unable to create ocr model using Path 'C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD' and language ‘eng’.
à Emgu.CV.OCR.Tesseract.Init(String dataPath, String language, OcrEngineMode mode)
à UiPath.Vision.Engines.TesseractLegacyExternalEngine.Initialize(String dataPath, String language, Boolean extractWords)
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()
08:57:26.6118 Error Error initializing Tesseract engine UiPath.Vision.OCR.OCRException: TessErrorLoadEngine
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()

qdeliere · March 26, 2019, 9:34am

The upper path (C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD) specified is well containing the “en” file.

varunk · March 26, 2019, 9:55am

Hi @qdeliere,

Can you explain clearly?

qdeliere · March 26, 2019, 10:12am

Hi @varunk,

in UIPath Academy, I’m doing the exercice 10 of training “Level 1 RPA Developer Fundation”.
In this exercice, you just have to analyse a pdf document “Perfect Match.pdf”

As you can see on initial screenshot Tesseract OCR is not working at all (see stack error upper).
While the windows OCR is working well.

For further project development, we need both OCR engine. So I must fix that issue.

I also report a Tesseract issue in: Impossible to install (and use) Tesseract OCR package

Kr,
Quentin

varunk · March 26, 2019, 10:14am

Hi @qdeliere,

Ok,but for that have you tried using Google OCR & Microsoft OCR
Either of the one is working right.

florinszilagyi · March 26, 2019, 10:26am

Do you have an antivirus installed?
Does the user under which the studio/robot is running have permissions to the tessdata folder?

qdeliere · March 26, 2019, 10:34am

Hi @varunk,

in UIPath Studio 2019.3.0, Google OCR is renamed Tesseract OCR.
So Microsoft OCR is working on “Perfect Match.pdf” but not Tesseract OCR…

As it’s the simplest pdf document ever. The behavior is not normal.

@florinszilagyi, there is no particular antivirus installed.
I’ve unchecked the “Read-Only” option to the tessdata folder. Also checking the access, everyone is allowed to read/write on it.

Furthermore, I’m executing UIPath in Admin mode.

Kr,
Quentin

florinszilagyi · March 26, 2019, 11:02am

@qdeliere can you tell me the exact size of the eng.traineddata file?
Also, try copying it into the build folder directly, see if that works.

What OS/architecture are you using?

qdeliere · March 26, 2019, 12:09pm

@florinszilagyi,

we are working with laptop having Windows 10 Pro.
I’ve tried the same “exercice” on another laptop with a complete uninstallation of the antivirus → same issue.

I also tried, moving directly the eng.traineddata file in the upper level directory, so the build one (and also in project folder). No changes, issue still present.

For the file properties, see below

qdeliere · March 26, 2019, 12:41pm

Hi @florinszilagyi,

full error stack (see below) is talking about “EMGU.CV.OCR” stuff… Is-it a .dll file ?

13:35:13.7797 Warn Error initializing Tesseract engine System.ArgumentException: Unable to create ocr model using Path 'C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD' and language ‘eng’.
à Emgu.CV.OCR.Tesseract.Init(String dataPath, String language, OcrEngineMode mode)
à UiPath.Vision.Engines.TesseractLegacyExternalEngine.Initialize(String dataPath, String language, Boolean extractWords)
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()
13:35:13.7955 Error Error initializing Tesseract engine UiPath.Vision.OCR.OCRException: TessErrorLoadEngine

If yes, it’s not present in my build folder.

florinszilagyi · March 26, 2019, 1:04pm

Please try with this debug build. https://drive.google.com/file/d/1x7omx1SGhZpLKYRKWrN5noVHjinwxtzB/view?usp=sharing
I don’t think it will solve the issue but it should have more logging information which may help us track the issue.

See here how to add a custom package source: Studio - Managing activity packages
Create a new package source in the folder you place the nupkg file and install that Vision version.

Thanks

qdeliere · March 27, 2019, 10:52am

Hi @florinszilagyi ,

after several tests on other colleagues’ laptop, the issue is in fact linked to my username having a “special” character.
C:\USERS\QUENTINDELIÈRE

We replace it by QUENTINDELIERE (without è), no issue anymore…

So no idea if this bug must be reported to UIPath or Google managing the OCR engine…

The “è” is often used by French and German people so this issue will appear again.

Thanks again for your time.
Regards,
Quentin

Topic		Replies	Views
Tesseract OCR Error Load Engine Studio studio	3	2029	February 9, 2022
OCR Engines not working on new update StudioX pdf , ocr , bug	9	182	September 19, 2024
Error performing OCR: TessErrorLoadEngine TessErrorLoadEngine Activities ocr , activities , question	2	437	September 19, 2023
Unable to Scrape anything using Tesseract OCR Help	7	5773	July 12, 2019
Error while using Google OCR in read pdf content Help uiautomation , activities	14	8797	May 27, 2019

Tesseract OCR not working (standalone + Screen Scraper)

Related topics