New UiPath Document Understanding features have been released!
Our new UiPath Document Understanding features have been released to Limited Availability in February 2020!
Note: The Receipt, Invoice, and Purchase Order ML Extractor are a Cloud Platform offering and will be released to general availability on April. Additionally, the on-prem solution will be also released in April.
Key Benefits
Flexible approach to extracting information from documents
You have the option to use templates with the Fixed Form (ex-Position-Based) Extractor, regular expressions with the Regex Extractor, or pre-trained models with the machine learning - ML Extractor. Also, you can bring your own OCR technology like Abbyy or Microsoft Forms to digitize documents before data extraction.
High accuracy for even ânoisyâ documents
We are continuously improving our AI to handle the complexities found in real world documents that make them hard to read, such as noise, rotated and skewed documents, or low resolution from printing and scanning.
Drag-and-drop AI skills directly into your workflow in Studio
IntelligentOCR package (to be renamed to UiPath Document Understanding) allows you seamlessly drag-and-drop our new ML Extractor into the workflow - similarly like any other extractor. Same infrastructure, new supercharged brain.
How to Install in Studio
Note that you need to create an account on platform.uipath.com and use a License key from the Licenses page for these features to work. The key must be pasted in API-Key property of the document understanding activities.
In order to utilize the Document Understanding framework you must first install the UiPath.IntelligentOCR.Activities activity package:
- Open the âManage Packagesâ window
- Click âAll Packagesâ in the left navigation, then search for âUiPath.IntelligentOCR.Activitiesâ
- Select the result, make sure the newest version is selected, then click the âInstallâ button in the right pane
Next, you need to install the UiPath.DocumentUnderstanding.ML.Activities activity pack:
-
Check âInclude Prereleaseâ, then search for âUiPath.DocumentUnderstanding.ML.Activitiesâ
-
Select the result and click the âInstallâ button in the right panel
Important - We must reiterate, make sure you check the âInclude Prereleaseâ checkbox.
The ML Extractor activity should be visible now in UiPath Studio, as shown below:
How to Use
Here are the demo videos on how to use UiPath Document Understanding:
UiPath Document Understanding Demo 1: Setting up the framework in Studio
UiPath Document Understanding Demo 2: Data extraction configuration
Additionally, see these step-by-step instructions to help you get started with the ML models:
-
If you havenât yet defined taxonomy for the documents you intend to process, you can do so using the Taxonomy Manager
-
Drag in the Load Taxonomy activity and store the taxonomy in a variable
-
Drag in the Digitize Document activity
- Drag-and-drop an OCR Engine inside the Digitize Document activity
- Define variables for both outputs: DocumentObjectModel and DocumentText
-
Drag-and-drop the Classify Document Scope activity and once again drag and drop the keyword-based classifier activity inside of it to add a new classifier. Activate the classifiers to enable classification
-
Drag in the Data Extraction Scope activity
- Populate input variables DocumentObjectModel, DocumentText, Taxonomy and DocumentTypeId. The DocumentTypeId is a string you can see in TaxonomyManager if you click on the Document Type you need to extract.
- Define variable for output: ExtractionResults
- Drag in Machine Learning Extractor and drop it inside the Data Extraction Scope
- Drag in Regex Extractor and drop it inside the Data Extraction Scope
- Drag in Position-Based Extractor and drop it inside the Data Extraction Scope
- Populate Endpoint input property with the URL of the endpoint you would like to use:
- https://invoices.uipath.com for Invoice processing
- https://receipts.uipath.com for Receipt processing
- https://purchaseorders.uipath.com for Purchase Order processing
- a local custom endpoint in case you have one of these models hosted on-premises
-
Click the âConfigure Extractorsâ link within the Data Extraction Scope activity
-
Expand the Document Type you are interested in and populate, on the right side column, the names of the fields which correspond to the fields in your taxonomy. See below for the full field names you need to fill in on the right side. You can mix and match between values and what you want to extract.
Pre-trained ML model for Invoices extracts the following name and line item fields:
ânameâ
âvendor-addrâ
âbilling-nameâ
âbilling-addrâ
âshipping-addrâ
âinvoice-noâ
âpayment-termsâ
âdue-dateâ
âpo-noâ
âdateâ
ânet-amountâ
âtaxâ
âtotalâ
âcurrencyâ
âline-noâ
âdescriptionâ
âitem-po-noâ
âquantityâ
âunit-priceâ
âline-amountâ
Pre-trained ML model for Receipts extracts the following name and line item fields:
ânameâ
âtotalâ
âvendor-addrâ
âdateâ
âphoneâ
âcurrencyâ
âexpense-typeâ
âdescriptionâ
âline-amountâ
âunit-priceâ
âquantityâ
Pre-trained ML model for Purchase Orders extracts the following name and line item fields:
âpo-numberâ
âdateâ
âclient-nameâ
âclient-addressâ
âvendor-nameâ
âvendor-addressâ
âshipping-nameâ
âshipping-addressâ
âbilling-nameâ
âbilling-addressâ
âpayment-termsâ
âdelivery-by-dateâ
ânet-amountâ
âtax-amountâ
âtax-rateâ
âtotal-amountâ
âline-numberâ
âdescriptionâ
âproduct-codeâ
âdelivery-dateâ
âunit-measureâ
âunit-priceâ
âquantityâ
âline-net-amountâ
âline-tax-rateâ
âline-tax-amountâ
âline-amountâ
âcurrencyâ