This new release rounds up a new set of features and capabilities of UiPath’s Document Understanding. Let’s see what we’ve been up to in the past few months!
We will start from the Document Understanding Framework, and go into details for each of its steps:
Fields with No Reference
You now have the possibility of defining whether a field allows values to be reported for it, even if they do not have a reference in the document being processed.
- Word of caution, please use this option wisely, and only when absolutely necessary! Not requiring a reference for a certain field would mean human users will tend to just “type in” a value, which means some extractors will not be able to learn from the human feedback)
- Effect: in Validation Station, fields defined as having references optional will allow users to add values that do not require a corresponding reference in the document being processed.
- Type of scenarios covered: when validating a document, humans need to make a “judgement” when filling in a certain value, to decide on a value to be reported, even if it is not an information present in the document.
UiPath Document OCR
Our own OCR engine is now available for your usage!
- the UiPath Screen OCR and UiPath Document OCR activities are now automatically available if you are using the latest UiAutomation package.
- if not, install the UiPath.OCR.Activities package, and there you will find the activities ready for you to use it!
- it is available for use both as a Service, as well as on-prem - for your convenience.
- it is now publicly available for processing West European languages on regular documents (not optimized for edge cases such as Receipts yet).
OCR Contracts are now Public!
You can now build your own OCR engine to use both in Screen Scraping as well as in Document Understanding.
- Effect: Any OCR engine can be now developed as an Activity that is compatible with the UiPath suite, by using the public OCR contracts package. (some more info here )
A new Abbyy Embedded option
For Abbyy fans, a new embedded option is now available for your convenience. Embedded, that’s right, as in doesn’t require the FineReader engine installed!
- install the UiPath.AbbyyEmbedded.Activities pack to have access to the Abbyy Screen OCR and Abbyy Document OCR activities
- the Abbyy Screen OCR activity is available for your usage, for free, in any UiAutomation context
- the Abbyy Document OCR activity offers enterprise customers 250,000 Abbyy OCR units free of charge to be used in Document Understanding and PDF processing scenarios (if more units are needed, reach out to your UiPath representative for details).
Intelligent Keyword Classifier
An awesome new classifier is available, with file splitting capabilities! (some more info here )
- use the new Intelligent Keyword Classifier as a simple classifier or as a classifier capable of also identifying multiple document types within the same file
- it is available for use both as a Service, as well as on-prem, through the support of AI Fabric
- it is capable of Learning! So do check out the Train Classifiers Scope and the Intelligent Keyword Classifier Trainer activities.
You now have a brand new UI for humans to be able to review automatic classification and file splitting is now available! (some more info here )
- available both as an attended activity as well as integrated into Orchestrator’s Action Center (some more info here ), the Classification Station is the perfect tool for validating, correcting, or even manually performing classification and document splitting within your document understanding processes.
Machine Learning Extractor
The Machine Learning Extractor is now fully integrated, from prediction to feedback loop!
- use the Machine Learning Extractor activity within the Data Extraction Scope, by either using one of the public endpoints for data extraction, or with your own custom model hosted in AI Fabric
- use the Machine Learning Extractor Trainer activity with the Train Extractors Scope, to collect feedback data from human validation and prepare it for ingestion back into the underlying machine learning model (more info here )
- import your collected feedback data into Data Manager (or (preview only) use a Data Labeling session in AI Fabric), to curate it, and then export it ready for the machine learning model training
- use the AI Fabric pipelines to retrain your model and to promote a new version to production!
Cloud and On-Prem Full Availability
Everything is now available both in the Cloud as well as On-Prem, starting with the 20.10 release of Orchestrator and AI Fabric
- if you are dealing with data sensitive cases, you can now rest assured you can use all of the Document Understanding features 100% on-prem!
- on-prem versions are available for Form Extractor, Intelligent Form Extractor, Intelligent Keyword Classifier, besides the Machine Learning Extractor.
- on-prem support is enabled through the use of AI Fabric - where custom DU ML models, pretrained DU ML model, as well as the Form Extractor / Intelligent Form Extractor and Intelligent Keyword Classifier models can be hosted and managed seamlessly.
- you can now track your Document Understanding consumption either in your Cloud account, or in your on-prem Orchestrator
Data Extraction Validation
New Face of Validation Station
A brand new look and feel of the Validation Station is up!
- new hotkey options for easier document tagging
- new color coding for faster visual validation of the automatically extracted information
- new field and value organization in the side panel
- new option to rotate views of certain pages within the document view panel
- new, simpler, area selection functionality (click - drag - release, instead of entering selection mode through a keyboard shortcut)
Validation Station is available, in its “face lifted” form, both as an attended activity as well as integrated into Orchestrator’s Action Center (more info on this here )