When trying to use UiPath.OCR.Tesseract, why is error " Read PDF With OCR: Error performing OCR: InvalidInputLanguage " thrown?
Issue Description: When trying to use the UiPath.OCR.Tesseract, an error message
- Read PDF With OCR: Error performing OCR: InvalidInputLanguage .
Resolution:
For the Tesseract OCR engine, the Language field needs to contain the language file prefix, for example "heb" for Hebrew.
Follow the below steps:
- Download the trained data language file from GitHub-Tesseract-OCR
- Unzip the downloaded file, rename the folder as "tessdata"
- Save the file in the UiPath Studio installation directory
e.g.: C:\Program Files (x86)\UiPath\Studio\tessdata or
C:\Program Files\UiPath\Studio\tessdata (for 64 bit installation)
C:\Program Files\UiPath\Studio\tessdata (for 64 bit installation)
- Ensure heb.traineddata is in the right folder
- Restart Studio
An alternative would be to place the training files in the existing tessdata folder found in the Vision package install location:
- C:\Users\%username%\.nuget\packages\uipath.vision\<version>\build\net461\tessdata (for Vision >= v3.0.0)
or
- C:\Users\%username%\.nuget\packages\uipath.vision\<version>\build\tessdata (for Vision < v3.0.0)