Data Manager - reading Cyrillic with OCR

Hello everyone,

The situation is:
I have Data Manager all set up and working. I also have a Microsoft OCR license, and I can use the OmniPage OCR free, as it requires no license.
I have Invoices, that I want to label and extract data using Data Manager. Some of those invoices are in English and some of them are in Cyrillic (Bulgarian).

The problem is:
The invoices in English are being read perfectly, but when I try to process those that are in Cyrillic, lets say, if in the Invoice is “Описание”, the OCR reads it as “OnncaHne” (using Microsoft OCR) which is absolutely incorrect.
When using OmniPage OCR “Описание” is read as “OnIcaHNe”.

The question:
Can you suggest how to make the OCR work not only for English but for Cyrillic too.


Hello @ydimitrova!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

  1. Always search first. It is the best way to quickly find your answer. Check out the image icon for that.
    Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.

  2. Topic that contains most common solutions with example project files can be found here.

  3. Read our official documentation where you can find a lot of information and instructions about each of our products:

  4. Watch the videos on our official YouTube channel for more visual tutorials.

  5. Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly


I think you can use Tesseract OCR with Bulgarian language, maybe it is “bug” that you need to specify in the Language property. I know that for Microsoft OCR, you need to download the language package in your own laptop ( Installing OCR Languages ) , but it never worked for me.

Good luck!


Thank you for the reply.

I found a workaround. I will try to explain it in case someone has the same problem and searches for solution.
Before uploading the files into Data Manager, I run a process to digitize them using OmniPage OCR as it allows to specify two languages in the properties (example: “BUL, ENG”). I need to specify two languages as in the Cyrillic documents there are some words in English too.
The steps in the process are:
Load Taxonomy → Digitize Document (with OmniPage OCR) → Data Extraction Scope (with regex extractor containing only one parameter ‘name’ with regular expression defined as ‘abc’) → Train Extractors Scope (with ML Extractor Trainer with specified only Output Folder).
As a result I have a zip file in the specified Output Folder that contains metadata. This zip should be imported into DM and then the documents are ready for labeling.