The UiPath Document Understanding team is pleased to announce the release of much awaited Machine Learning Document Classification functionality.
What is Machine Learning Document Classification?
Machine Learning Document Classification functionality is a suite of capabilities that will help users classify documents using a custom trained ML model. This will augment current classifier offerings such as Keyword Classifier and Intelligent Keyword Classifier. As a part of this offering, we are releasing three product features – 2 activities (Machine Learning Classifier - MLC and Machine Learning Classifier Trainer - MLCT) and 1 Out-of-Box (OOB) ML package (DocumentClassifier).
When to use Machine Learning Document Classification?
Machine Learning Document Classification can be used in situations where the other simpler classification techniques such as Intelligent Keyword Classifier might not provide accurate results. While this technique can be used on any document set of reasonably big size, it is more preferrable for scenarios where you have high diversity in document sets.
How to use Machine Learning Document Classification?
Let us say, you want to classify a document into four classes – receipts, invoices, purchase_orders and utility_bills. You can do this using Machine Learning Document Classification in three easy steps:
Step 1: Creation of a labeled dataset for ML model training
Follow these sub-steps:
Create a project in AI Fabric and an empty dataset in the project like this:
On your Studio workflow add “Machine Learning Classifier Trainer” in the “Train Classifiers Scope”. When you refresh the “Project” and “Dataset” fields, you will see a drop-down with possibly multiple entries, including the project and dataset that you create above:
Configure the “Machine Learning Classifier Trainer” using “Configure Classifiers” in the bottom. If you are creating the ML model for the first time (as opposed to trying to create dataset to revise an existing ML model), skip the part where it asks for “ML Skill” and just proceed to next screen and manually enter the names of classification classes and map them to the Document Types declared in the Taxonomy Manager, as shown below:
If you have an existing ML Skill and want to create more labeled dataset to improve the model, just use the skill to find the class names. In this scenario, you can do the mapping between classification classes and Document Types specified in the taxonomy by first using “Get Capabilities” and then just selecting the correct class names from the drop down:
As you pass document through the workflow containing the Machine Learning Classifier Trainer, the document will get labeled and stored in appropriate folder on AI Fabric. You will see a structure like this:
Now, we have a labeled dataset created and can move to the next step.
Step 2: Create a ML Skill for Document Classification
Follow these simple steps to create the ML skill (If you have never used Document Understanding packages from AI Fabric, please review more detailed directions here: ML Packages):
Create a package using “DocumentClassifier” package available under Out-of-the-box-Packages in UiPath Document Understanding
Once you create a package, you will be able to go to “ML Packages” and can see the package you just created:
Create a train pipeline using the package created above and the dataset you created in Step 1:
Create a ML Skill using the trained package:
Congratulations, you have successfully created a ML model that can now be used in a workflow for document classification!!!
Step 3: Perform Document Classification Using the Custom ML Model
Follow these simple steps:
In your workflow, drop the Machine Learning Classifier and point it to the ML Skill created in Step 2. You will also need to provide Document Understanding API Key as shown below:
Next, click on Configure Classifiers to specify the type of classifier you want to use for a specific document type and to match “Document Type” names used in Taxonomy to ones used by the ML Skill. The later can be done very easily by using “Get Capabilities” by either configuring it when you first dropped the activity or clicking on the gear icon and then simply selecting the class names from the drop-down
Everything is all set, test your Document Classification model: