How to use the IntelligentOCR Package


Sorry for the late reply, i wanted to wait to have something certain to say :slight_smile:

In a matter of days, we will be launching a new preview version of the IntelligentOCR package - and lo and behold - there will be a few new activities in there. Of which one is called :tada: Present Classification Station :slight_smile:

Stay tuned to have an out of the box solution for your use case, and thank you for the report!



In the Taxonomy Manager, pls define a field called “MyTable” (or whatever name you want to give it), select type Table, then start adding the columns one by one. Description, Unit, Qty etc should be columns.

Let me know if it works


1 Like

Everyone, a little reminder that we do have on-prem machine learning stuff now: May 2020 .
For enterprise users, feel free to reach out to your UiPath contacts in case you need to try custom trained models :slight_smile:

Hi @Ioana_Gligan,


Awesome, can’t wait for this to be out!
Am i reasonable to say that it would be available in both Local (meaning presenting the station locally on the robot PC) and in due time, make its way to the ‘Human In the Loop’ validation station accessible from the Orchestrator?


Yes :slight_smile:

1 Like

Ladies and Gentlemen,

Another update for you! Check this out: Document Understanding: Document Splitting and Other Wonderful Stories :)

I will publish a new sample project in a few days :slight_smile:


1 Like

My regex based extractor isn’t picking up the value even with the capture checkbox selected. It works within the test text but when running the workflow the data field isn’t picked up and I’ve tried multiple fields but no luck.

Any help would be definitely appreciated.

Try to clean your string and this will probably solve it. There is potentially some invisible characters or atleast UiPath thinks there is.

Search for clean illegal characters error on the forums.

Essentially you will be replacing all characters except the ones you need to extract.

Hello @shauny,

Pls let’s ok at the text version of your document. If it has a complex layout, the text representation might differ from the one you visually see . Try to enable the UseVisualAlignment flag on the extractor, this might solve your issue

I’m new to regex expressions so I’m not sure how to do that. Should my current regex expression include an expression to clean the illegal characters? Or will this be placed in an assign activity?

Yeah, I would say the document has a complex layout. I’ve set the UseVisualAlignment to true but that still doesn’t work.

To clean your characters, use an Assign activity with:
System.Text.RegularExpressions.Regex.Replace(INSERTVARIABLE, “[^a-z A-Z 0-9]”, “”)

THis will replace all characters with nothing EXCEPT characters in the range of: a-z or A-Z or 0-9.


Hello everyone!

If you are visiting this post, you might be interested in this:

See you online!


1 Like

Hi @Ioana_Gligan,
I am trying to understand the use case for “Wait For Document Validation Action And Resume”. My understanding of this activity is once you complete the “action” in orchestrator this activity should resume and trigger the other processes (if any or complete the process) but in my case it remains in “suspended” state and I have to manually run it in order to complete the process. Am I missing something here? Any pointer/hint would be really helpful.
P.S: I am using community edition

Hi Guys, I noticed an issue with an updated version of the ‘Train Classifiers Scope’ activity.

I’m not sure where to find the ‘Human Validated Classification Data’ input variable.

I have been following the video tutorial: UiPath Document Understanding Demo 1: Setting up the framework in Studio - YouTube
It reaches a point in the video where the ‘Train Classifiers Scope’ activity is configured. In the video, it only conatins the ‘Human Validated Data’ input variable.

I have checked the documentation :

No mention of the variable is given there. @Ioana_Gligan , or anybody out there… do you know what this input is?

I know it’s an array of some sort, but i can’t find any further information regarding this.


Hello @David_Bailey - and welcome to our Community!

The new in argument is optional - you can use EITHER the new one (available as the confirmed data from the new “Present Classification Station” activity :tada:), or the old one , human validated data, as input from the Validation Station.

Regarding documentation - please see that the IntelligentOCR chapter has a “folder” of docs for “Preview” releases. You can find all the documetnation there.


Is there any chance to make both classification station and validation station run in the background instead of clicking on save buttton manually?
Thanks in advance.

1 Like

Hello @preethamchap, and welcome :slight_smile:

Running these without a human actually having a look over the documents would defeat their purpose :slight_smile:

If you don’t need human validation, just take it out of your workflow.

On the other hand, if you want 100% accuracy, I recommend to always use them with humans reviewing the documents, as at least OCR errors are always possible.

Hope this helps,


Hello @Ioana_Gligan,

I have been trying to get up to speed with the DU feature, at least to see if I can use it to implement what I think I can use it for; however, I have a couple of questions, but permit me to mention the top two in my mind right now.

  1. If I use the Machine Learning Extractor, how do I get it (the algorithm) to learn? Because I have done validation on the same invoice document more than 5 times and it keeps making the same error every time.
  2. Also, if one uses the Regex Based Extractor is it then fair to say “learning” is ruled out since the whole thing works on already defined regex definitions?

Thanks for the anticipated response.