Tesseract OCR not working (standalone + Screen Scraper)

#UIPath Studio Community 2019.3.0

Hi guys,
I’ve a lot of issues using the Tesseract OCR engine, the Microsoft is working perfectly but not the Google One.

Even using the Screen Scraper Wizard it’s not working see screenshot.

See the error stack below (from log file *_VisionHost):

08:57:26.5493 Info Input language:eng, translated language:eng
08:57:26.6118 Warn Error initializing Tesseract engine System.ArgumentException: Unable to create ocr model using Path 'C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD' and language ‘eng’.
à Emgu.CV.OCR.Tesseract.Init(String dataPath, String language, OcrEngineMode mode)
à UiPath.Vision.Engines.TesseractLegacyExternalEngine.Initialize(String dataPath, String language, Boolean extractWords)
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()
08:57:26.6118 Error Error initializing Tesseract engine UiPath.Vision.OCR.OCRException: TessErrorLoadEngine
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()

The upper path (C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD) specified is well containing the “en” file.

Hi @qdeliere,

Can you explain clearly?

Hi @varunk,

in UIPath Academy, I’m doing the exercice 10 of training “Level 1 RPA Developer Fundation”.
In this exercice, you just have to analyse a pdf document “Perfect Match.pdf”

As you can see on initial screenshot Tesseract OCR is not working at all (see stack error upper).
While the windows OCR is working well.

For further project development, we need both OCR engine. So I must fix that issue.

I also report a Tesseract issue in: Impossible to install (and use) Tesseract OCR package

Kr,
Quentin

Hi @qdeliere,

Ok,but for that have you tried using Google OCR & Microsoft OCR
Either of the one is working right.

Do you have an antivirus installed?
Does the user under which the studio/robot is running have permissions to the tessdata folder?

Hi @varunk,

in UIPath Studio 2019.3.0, Google OCR is renamed Tesseract OCR.
So Microsoft OCR is working on “Perfect Match.pdf” but not Tesseract OCR…

As it’s the simplest pdf document ever. The behavior is not normal.

@florinszilagyi, there is no particular antivirus installed.
I’ve unchecked the “Read-Only” option to the tessdata folder. Also checking the access, everyone is allowed to read/write on it.

Furthermore, I’m executing UIPath in Admin mode.

Kr,
Quentin

@qdeliere can you tell me the exact size of the eng.traineddata file?
Also, try copying it into the build folder directly, see if that works.

What OS/architecture are you using?

@florinszilagyi,

we are working with laptop having Windows 10 Pro.
I’ve tried the same “exercice” on another laptop with a complete uninstallation of the antivirus → same issue.

I also tried, moving directly the eng.traineddata file in the upper level directory, so the build one (and also in project folder). No changes, issue still present.

For the file properties, see below

Hi @florinszilagyi,

full error stack (see below) is talking about “EMGU.CV.OCR” stuff… Is-it a .dll file ?

13:35:13.7797 Warn Error initializing Tesseract engine System.ArgumentException: Unable to create ocr model using Path 'C:\USERS\QUENTINDELIÈRE.NUGET\PACKAGES\UIPATH.VISION\1.4.0\BUILD' and language ‘eng’.
à Emgu.CV.OCR.Tesseract.Init(String dataPath, String language, OcrEngineMode mode)
à UiPath.Vision.Engines.TesseractLegacyExternalEngine.Initialize(String dataPath, String language, Boolean extractWords)
à UiPath.Vision.Engines.TesseractLegacyEngine.Initialize()
13:35:13.7955 Error Error initializing Tesseract engine UiPath.Vision.OCR.OCRException: TessErrorLoadEngine

If yes, it’s not present in my build folder.
image

Please try with this debug build. UiPath.Vision.1.4.7024.27080.nupkg - Google Drive
I don’t think it will solve the issue but it should have more logging information which may help us track the issue.

See here how to add a custom package source: Managing Activities Packages
Create a new package source in the folder you place the nupkg file and install that Vision version.

Thanks

Hi @florinszilagyi ,

after several tests on other colleagues’ laptop, the issue is in fact linked to my username having a “special” character.
C:\USERS\QUENTINDELIÈRE

We replace it by QUENTINDELIERE (without è), no issue anymore…

So no idea if this bug must be reported to UIPath or Google managing the OCR engine…

The “è” is often used by French and German people so this issue will appear again.

Thanks again for your time.
Regards,
Quentin