Google OCR engine : Invalid Input Language

I have been going through Google OCR documentation trying to solve this error “message”: “Google OCR: Error performing OCR: InvalidInputLanguage” .
It is said that you can find the supported language prefixes here this . I work with dutch language, so i use “nld” prefix which is stated in “Language and Scripts” section.
As a workaround i can use english language but it is not reliable.

Much appreciated!

@ykuzin

In Google Tesseract OCR, only English language is available by default whereas in Microsoft Modi OCR , you’ve various options to select different languages.You can try to Microsoft one.

If you’d like to only go with Google OCR, then you need to add the languages additionally.

3 Likes
  • Language - The language used by the OCR engine to extract the text from the UI element or image. The language name must be fully written, such as “english”, “japanese”, “romanian”. The Microsoft OCR engine uses the languages installed on your system.

Does this mean Dutch have to be preinstalled on PC locally? Additionaly they say that language must be fully written, instead in uipath activity it is shortened.
image
How to choose appropriate scale? Is there any frame like 1-10?

Thank you!

@ykuzin

Yes, you need to install Dutch language by using Microsoft package. Once it’s installed, try OCR scrapping first, in the lang drop-down, you’ll see full name of languages. Once you select, it gets shortened automatically.

Answering to your scale question: I’ve done one usecase on OCR and I’d say from my experience, on a scale of 2-4 you’ll get 90-95% of the values correct. But it still depends upon the images and data complexity.

Let me know incase you require anykind of help.

1 Like

Gladly,
Appreciate that!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.