OCR Engines in Studio - Setup and Languages


#1

Guidelines on how to use setup OCR engines with different languages:

  • Microsoft OCR
    In Windows 10, you need to add the desired language in Control Panel - Language. Note that OCR support is not available for all languages.
    In Windows 7, the OCR engine is not preinstalled. You need to install SharePoint 2007 MODI (Microsoft Office Document Imaging) and then the desired language pack for Office 2007.

  • Google OCR
    Google OCR is using the Tesseract engine version 3.04 (at least in UiPath Studio 2016.2 and 2017.1). See this post on how to install additional languages. Note that you must use tessdata files for version 3.04 and not the latest version available.

  • Abbyy OCR
    See this post to what you need for Abbyy OCR. The installation kit can be found on \\fileserver\Public\ABBYY.
    Trial Codes:
    SWXE-1101-0006-1790-0689-0119
    SWXE-1101-0006-1790-0550-8358
    SWXE-1101-0006-1790-0449-6150
    SWXE-1101-0006-1790-0355-9918
    SWXE-1101-0006-1790-0253-7680
    SWXE-1101-0006-1790-0194-4677
    SWXE-1101-0006-1790-0014-7835
    SWXE-1101-0006-1789-9972-8700
    SWXE-1101-0006-1789-9873-3335
    SWXE-1101-0006-1789-9742-2436


How to OCR this accurately
Google OCR (Tesseract-OCR) 辞書と変換バグについて
#2

Windows 7 user here. I installed MODI, I installed German in Windows display language and Office language. Still, I cannot scrape text in German using Microsoft OCR.

Any ideas?