Invalid Input Language When Using Tesseract OCR Engine

When trying to use UiPath.OCR.Tesseract, why is error " Read PDF With OCR: Error performing OCR: InvalidInputLanguage " thrown?

Issue Description: When trying to use the UiPath.OCR.Tesseract, an error message

  • Read PDF With OCR: Error performing OCR: InvalidInputLanguage .



Resolution:

For the Tesseract OCR engine, the Language field needs to contain the language file prefix, for example "heb" for Hebrew.

Follow the below steps:

  1. Download the trained data language file from GitHub-Tesseract-OCR
  2. Unzip the downloaded file, rename the folder as "tessdata"
  3. Save the file in the UiPath Studio installation directory
e.g.: C:\Program Files (x86)\UiPath\Studio\tessdata or
C:\Program Files\UiPath\Studio\tessdata (for 64 bit installation)
  1. Ensure heb.traineddata is in the right folder
  2. Restart Studio

An alternative would be to place the training files in the existing tessdata folder found in the Vision package install location:

  • C:\Users\%username%\.nuget\packages\uipath.vision\<version>\build\net461\tessdata (for Vision >= v3.0.0)
or
  • C:\Users\%username%\.nuget\packages\uipath.vision\<version>\build\tessdata (for Vision < v3.0.0)


A post was split to a new topic: Read PDF With OCR: Error performing OCR: InvalidInputLanguage Error Still Occurring