UiPath’s Document Understanding now has support for file splitting, custom ML models, better digitization and more!
The Intelligent OCR package (4.7.0-preview version) is out, and is ready to help you in even more complex use cases.
The Heros of this new version are a few new activities that allow you to work with files that contain multiple documents within them.
The Intelligent Keyword Classifier (and its companion, the Intelligent Keyword Classifier Trainer) are here to help you classify and split documents: if you now need to process a file that contains multiple documents inside, you can give the Intelligent Keyword Classifier a try!
All you need to do is add it in a Classify Document Scope activity. Like this:
Don’t forget to Configure your Classifier for the doc types you are targeting for classification, like this:
It has a companion, the Intelligent Keyword Classifier Trainer (that goes into the Train Classifiers Scope) - this activity is used to help the Intelligent Keyword Classifier get better with each file you process!
But Don’t Panic, as the Hitchhiker’s Guide to the Galaxy recommends. You don’t actually need to run “the real deal” to get it trained. You can do this at design time as well, using the Manage Learning wizard. Like this:
That is, click on Start Training (or the Edit icon for a doc type that already has training), select a few files that contain single samples of that document type (e.g., 3 documents each containing a single document of type X, not a document containing 3 samples of type X), and let it do it’s job. You will notice that the word vectors start appearing.
This would not be too useful by itself, so we’re also publishing the …
With it, the Document Understanding Framework gets another (that’s cool, yeah) feature. It is an attended activity that allows humans to review and correct automatic classification, split files into multiple document types, all in an awesome and very simple user interface. Like this:
- you can view the document and scroll through it on the right side
- you can view and edit page range splits and associated classes on the left side
- you can move pages to adjacent classes by drag and dropping them
- you can split a range of pages by clicking on the split option between any two pages
- you can merge two ranges by using the “Move all Pages Up/Down” options (under the three dots, like this:
- you can scroll to any page by clicking on it
- you can scroll to any document type by clicking on it.
Mind you, this is an OPTIONAL step: the output of the Classify Document Scope and the output of the Classification Station are of the same type. If you do want 100% accuracy though, we recommend you use it.
One associated change that you might want to implement in your processes is that related to handling ALL the doc types found within a file. For this, after performing a classification with splitting (using the Intelligent Keyword Classifier), after a human confirms the classifications, you can move forward with much more confidence into the Data Extraction phase… in a FOR EACH loop (now you don’t only have one class, you potentially have multiple classes, right?)
You don’t have to worry about anything, as all Extractors will only get the page range they should be performing extraction on.
The Machine Learning Extractor (from the UiPath.DocumentUnderstanding.ML.Activities pack) got a new configuration option, if you want to use it with an AI Fabric ML Skill. Like this:
The ML Skill dropdown will be populated with your Document Understanding Skills, if your robot is connected to a Cloud Orchestrator that has AI Fabric enabled (and, of course, has Document Understanding ML Skills).
AI Fabric is now in GA , and you can use it as infrastructure for managing your document understanding models. Available in our Cloud platform for enterprise accounts, AI Fabric can configure, train, host and serve Machine Learning models for Document Understanding. You can choose to start from our pre-trained Invoices or Receipts model, or with a blank-slate DU model that you can train (using data tagged using DataManager) on any fields of interest to your use case.
Starting with this release, you can use the Microsoft Azure Computer Vision OCR and the Google Cloud Vision OCR as engines for design-time training and template setup in Document Understanding. Like this:
Google Cloud Vision OCR now has another Input Argument, called DetectionMode . This is by default set to “TextDetection” (the current implementation), but you might want to try it on the “DocumentTextDetection” mode as well. In some use cases, and for specific languages, one or the other of these two options might perform better. The setting is… Like this:
In case you missed it, we are working on our own Document OCR engine, which comes with a companion activity in the UiPath.OCR.Activities package, in community preview.
(see what I did there? )
Please don’t forget to send us your feedback so we can improve these preview features and make them shine in your workflows!
The Document Understanding Team