How to use the IntelligentOCR Package

Hello, @loginerror
Thanks for help! :slight_smile:
Can i ask, Taxonomy editor and Keyword Based Classifier support cyrillic?
Let me make it simple: Intelligent OCR activities support cyrillic? :slight_smile:

1 Like

Hello @Foertsch,

IntelligentOCR is language agnostic. You can define documen ttypes in cyrillic using the txonomy manager, they should be properly displayed in all wizards and in the validation station, keyword based classifier is language and alphabet agnostic… as long as it’s representable in UTF-8.

I have to warn you though that DIgitize Document is optimized for left to right top to bottom writing, and works best for latin languages… We will be optimizing this for other languages / alphabets as well.



I tried using the IntelligentOCR Package… Trained 5 invoices… is there a way to remove Present validation Step after training for a few times? Can I re-use the learning file without using Validation step for new invoices with same format?


1 Like

How to create file .json for activity Keyword Based Classifier? Is it created from UiPath itenface, like a taxonomy file from taxonomy editor?

1 Like

I’m sorry, i found solution for my question:
" The activity does not automatically create a file at the specified location. A best practice is to create an empty .JSON file at that location."



We are also working on keyword based classifier wizard, to help you get started faster with automating classification.

The wizard will allow you to define keywords (single or multiple words that come one after another) or sets of keywords (multiple groups of words that must all be found at the beginning of the document for classification, but can be in distinct places). it also allows you to review the learning if you believe junk has squeezed in, and thus clean up your data.

We are also introducing a new InArgument, LearningData, in which you can, if you wish, provide a string variable with the learning data. This has been added so that, due to the fact that learning should ideally be centralized so that all robots can use and update it, it is easier to read it from whatever place, and then just feed its content in as a variable.

Hope this will ease the use of the classifier!


hey @loana_Gligan which uipath studio version is this …mine is 2019.1.beta and on my studio it’s not working

Hi @Siddhant_Dimri

The latest stable version is 2019.10.

Could you update and try with it?

@loginerror sure will do that.

Can we remove the Present Validation Station attended activity ? This would require manual intervention.

You can remove it if you want to trust your extractors and classifiers 100% (which I don’t recommend), or if you don’t care about 100% accuracy.

You can also move it in a separate process if it suits your business case better.

But overall it is a pretty important component of the entire puzzle, that is why I added it to the sample workflow.

1 Like

Hi Lona,

I am getting below errors. I am not able to repair the dependencies. Could you please suggest.


Harish Vemula

Please try to search for the MachineLearningExtractor on the Official feed with the Include Prerelease chekbox checked - the activity should appear in a 1.0.0.-preview package of UiPath.MachineLearningExtractor.

Got it. Thank you Ioana.


1 Like

Hi @Ioana_Gligan, thanks for your work on this! This package is awesome and very powerful… when it works.

I’m having issues with the Classify Document Scope properly detecting document types and classifiers. I experience this error after I add a new Document Type to the taxonomy through the Taxonomy Manager. Then I click “Manage Learning” on the Keyword Based Classifier activity inside the Classify Document Scope and I add some new keywords for the new document type.

Then I click the “Configure Classifiers” button in the Classify Document Scope and I check the box next to the new document type. Then I receive this error:

But it is! That document type is definitely in there. So for some reason, an error is showing up even though the required information is there.

It seems a fix to this is to remove the Intelligent OCR package dependency from Studio, then to install it again. After I do that, the error disappears.

Other times, the Taxonomy Manager is glitchy. Sometimes I can’t add new categories. This isn’t fixed by reinstalling the IntelligentOCR package in Studio.

Or if I add a new document type, it doesn’t show up until after I close the taxonomy manager and open it again.

Do you have any thoughts to share on these issues I’m having?

EDIT: I’m trying out the Intelligent OCR package on another computer and I don’t seem to be experiencing the same issues… I’ll continue to investigate…

1 Like

@oscar thank you for the reports!

Please let me know if you can reproduce the same issues on the other computer.

Also, can you please share:

  • IntelligentOCR package version
  • Studio version
  • if possible a sample workflow reproducing the issue with a step by step guide to do it?

It is weird that this happens indeed.

Related to the Taxonomy Manager - you can add a category once you select a group under which you want to create it. Try selecting an existing group (or creating one), and then creating a category.

Related to the Classify Document Scope - Configure Classifiers, just double checking that after checking the new doc type, you clicked save and not cancel? :slight_smile: Kidding aside, I will try to reproduce independently anyway. Thank you!


Hi @Ioana_Gligan, thanks for your fast response! I see I was just creating categories wrong, oops! My fault :stuck_out_tongue:

I’ll play around with it a bit more to see if I can reproduce my issues on my other PC and come back to share my results.

I’'m wondering, is it possible to save these variables (DocumentObjectModel, ClassificationResults, and ExtractionResults) to an external file, then load them back into the workflow as variables from that file later on? This would be like the “Load Taxonomy” activity that reads the taxonomy.json file into a variable.

Except here it would be like “Load DocumentObjectModel” or “Load ExtractionResults”, etc…

My thinking is that I would like to preprocess all of my input documents before I present them to the user to validate. This would make it faster for the user to validate each document, since the document is already digitized, classified, and has the data extracted, and I can compare it to my database before the user validates the content.

I know I can save the DocumentObjectModel variable into a .JSON file using the “Deserialize JSON” activity, but I can’t think of a way to convert the .JSON file back into the DocumentObjectModel variable.

Does that make sense? Is this possible? Or do I need to do everything in the same workflow?

Thanks again for your insight!

Edit: I think this may be possible with the “Document Processing Contracts”? Is that right? Would you be able to explain how I could use this package to convert a DOM into a file, and then how to convert that file back into a DOM?

Edit 2: I figured it out. If you want to save a DOM to a text file, you just need to add the “UiPath.DocumentProcessing.Contracts” package to your project. Then you use the Serialize method on the DOM variable and assign it to your string variable. Then you can save that to a text file. Then when you load the text file, you just need to call the Deserialize method on the string variable and you can convert it back into a DOM. I’ve attached an image for other people to learn from :slight_smile:


You found the right way! All objects have serialize/deserialize on them so you can store and retrieve them.

Also for real life scenarios, you can look into breaking the process in three steps: automatic processing, then human interaction with the validation station, then post processing. This way users don’t have to wait for automatic processing.

You can try to synchronize the three pieces using Orchestrator queues, long running workflows etc.

Have a nice day,


1 Like

Hi @Ioana_Gligan, thanks your suggestions. I’ll be sure to try out your ideas soon.

For now, I have a bug report to share with you related to the proper reading of the taxonomy in Studio and in the IntelligentOCR Activities. You can watch the video that shows the bug here: Loom | Free Screen & Video Recording Software | Loom

I’ve also attached the two projects so you can investigate them yourself: (823.8 KB) (821.9 KB)

I imagine that you won’t experience any issues with these files, since both projects will be on your local machine. You could try setting up Google Drive File Stream and copying the project into there, then I am 100% sure you would be able to recreate the issue.

  • I am using UiPath Studio 2019.10.1.
  • IntelligentOCR Activities version 4.0.1 (but it doesn’t matter which version of the activities you use, ALL of them have this same bug, including 4.2.0-preview).
  • Excel.Activities = 2.7.2
  • Mail.Activities = 1.7.2
  • System.Activities = 19.10.1
  • UIAutomation.Activities = 19.10.1

The issue relates to Google Drive File Stream and the Intelligent OCR activities. It seems that when a project that uses Intelligent OCR activities is stored in Google Drive File Stream, it prevents UiPath Studio from properly reading the taxonomy. This causes issues with the “Classify Document Scope” activity, as well as issues with adding new categories and items into the Taxonomy.

Even if the project folder in Google Drive File Stream is marked as “Available offline”, the issue will still persist.

The thing that made me realize the issue was with Google Drive File Stream and not anything else, was that I copied the exact same project from File Stream to my Desktop, and the issue went away.

I hope that you can document this bug and get it resolved soon!

My work around right now is to move the project from File Stream to my Desktop, work on it, then move it back to File Stream when I’m done.

Please let me know your thoughts and if you have any questions for me.



Oh my, @oscar,

This is a cool bug report! Thank you, truly! :slight_smile:

I’ll look into this and see how we can fix.

Again, thank you!