Document Understanding: New Human-Robot Levels Available :)

Ioana_Gligan · August 24, 2020, 10:00am

You know Life’s a Journey, right?

So is Document Understanding: a journey of new features and capabilities, of constant improvements, and, of course, of your files through your processes!

To “help you pack ”, we’ve prepared some nice little artifacts for you:

Public OCR Contracts

Remember the open and extensible framework we’re building?

Keeping true to our promise, we are now publishing a Public OCR Contract nupkg, so that YOU can build Custom OCR Activities!

If you are using an OCR engine or service very specific to your needs, you can now wrap it up in a custom activity and make it available for usage in both Screen Scraping and UiAutomation activities, as well as in Document Understanding.

Through this new UiPath.OCR.Contracts version, we are opening the UiPath ecosystem of activities to any and all customers, partners and technology partners, to publish their own OCR engines, tailored to the markets and use cases they face.

How to Use This

Use the latest preview version of the UiPath.OCR.Contracts package in your Visual Studio project, and start building.

The documentation around the public contracts is available here. Make sure to check out the code samples in the documentation, and you’ll get to…

And the journey continues!

Classification Station in Action Center

You’ve probably seen it already as an attended activity. Now, you can take it to the next level, and have your Long Running Processes take a well-deserved “break” while the humans get the chance to look over the Automatic Document Classification results.

( For those of you who missed our previous announcements, see this to get a full context around the new document classification, splitting and validation capabilities)

How to Use This

Upgrade your workflows to the latest release of the IntelligentOCR package (tick the Include Pre-Release flag please ), and use the Create Document Classification Action and Wait for Document Classification Action and Resume activities:

Just like for the Document Validation activities (about which you can find out more over here), the Classification Validation activities use the Orchestrator’s Storage Buckets for storage support for the lifetime of the human task, and communicate from one to another through the ActionObject output:

(see the trail? and it continues…)

Extractor to Trainer Communication + Machine Learning Extractor Trainer

So now that classification is done, the Data Extraction part comes into play.

You are probably already using the Data Extraction Scope activity in your workflows. Now it’s time to enable the Train Extractors Scope as well, to enable the feedback loop for ML Models!

How to Use This

Straight forward: after the human validation step (be it as an attended activity or integrated into Action Center), add a Train Extractors Scope. Within it, add the newly released Machine Learning Extractor Trainer activity.

The Machine Learning Extractor Trainer will collect the human feedback for you, in a directory of your choice. Once you collect data and you want to retrain your model, just zip the content of the directory and upload it in your Data Manager for curation. Now that you know what your machine learning model will learn (after review and potential corrections), just export your new dataset, and start a new training pipeline for your ML Model. Whenever you’re satisfied with the performance of the new version of your model (performs better than the current one), you can then promote it to Production in your AI Fabric instance.

And you’re done

… well, almost.

Another thing that we’ve added, is the possibility for an Extractor activity to communicate with a Trainer activity. In order to enable this, you will need to “get them acquainted” . The way to do this, is to use the Framework Alias boxes that appear next to each extractor and each trainer in the Configure Extractors wizards of the Data Extraction Scope and Train Extractors Scope.

Tips and Tricks

If you do not already have an ML Model that you want to retrain, but just want to collect data for a future model, you can do this by clicking “Cancel” when the Machine Learning Extractor Trainer’s wizard pops up. This will allow you to manually enter, in the Configure Extractors Wizard, the field names that you want your future model to be trained on. Not entering anything (just ticking the box) in this case would generate field names equal to the field IDs in the Taxonomy, so you might want to use prettier ones…

If you have multiple robots executing these processes , make sure you collect the data from all of them… the more the merrier they say

If you are capturing data for training using the new Machine Learning Extractor Trainer activity for a model that you are also using for prediction in the Data Extraction Scope, then please don’t forget to:

give the Machine Learning Extractor an Alias (any string would work)
give the Machine Learning Extractor Trainer the SAME Alias

So that the two components know they are paired. Use the Configure Extractors wizard to set these aliases.

… and We Keep Going!

While we keep treading the path towards getting the Document Understanding Framework as complete and robust as possible, do lend us a helping hand by sending us feedback on what you love or hate about it, what you feel is missing or not working properly …

Until next time, safe journey to all!

The Document Understanding Team.

Maneesha_de_silva · August 24, 2020, 3:01pm

@Ioana_Gligan
OMG 2 releases news on one day
what a great news

Shubham_Varshney · August 24, 2020, 5:25pm

Great news on trout, I’ll say

UiPath Rocks

Palaniyappan · August 24, 2020, 5:28pm

Aah…When I feel I m done…
You guys make me feel Hey man you are not yet…!

Simply amazing
@Ioana_Gligan

vivekm · August 24, 2020, 5:51pm

Looks great ,document classification on action and train AI model in validation station

niraj.shah · August 25, 2020, 4:44am

This is the feature to die for. Feels like we have now a complete end-end solution for DU. So excited and can’t wait to get my hands dirty.

SrenivasanKanna · August 25, 2020, 7:38am

Wow… It’s a Great news

davendra · August 25, 2020, 12:06pm

Looking very very good! Excellents work to the UiPath team !!

VISHNU07 · August 25, 2020, 1:28pm

Great

ADITYA_MUKHERJEE · August 26, 2020, 11:30am

Is this entire thing available in community edition?
what are the limitations compared with a licence version?

pradeepkintali · August 26, 2020, 1:29pm

Great enhancements to the existing Document Understanding.

Ioana_Gligan · August 27, 2020, 3:40am

Limitation for community:

you have page limits (max 1 or 2 pages depending on used extraction methods)
you cannot train your own custom ML model.

Gopal_Tewari · August 28, 2020, 6:12am

Great the Train Extraction Scope i was looking for .

padmapriyas0002 · August 28, 2020, 7:38am

Awsome great news

charith_wickramasing · August 30, 2020, 9:58am

Thanks for sharing

Pradeepta_Mohapatra · August 31, 2020, 12:31pm

Thanks

alin.c.mihalea · September 6, 2020, 7:30pm

Hello, @Ioana_Gligan

The Machine Learning Extractor Trainer sounds like a very cool feature and I would really like to test how it works in practice.
However, I got stuck at the Data Manager part:

just zip the content of the directory and upload it in your Data Manager for curation

I am note sure which are (or where to find) the registry credentials (username, password) mentioned in the docker login command from the documentation: AI Center

I am using an Enterprise Cloud trial license. Is there something I am missing?
Thanks.

Whynotrobot · September 10, 2020, 12:55pm

Yep, that’s another type of licence you need to aquire via Uipath sales.

Ioana_Gligan · September 15, 2020, 8:58am

You need to reach out to your UiPath contact to get one - OR, if you can wait for another few weeks, you will see DataManager available directly in AIFabric Cloud

bbimuyi · September 24, 2020, 5:29pm

I was really hoping to use DU and ML to use with excel documents. Do you see that being included in a future release. Or can you use ML outside of DU?
Thanks

Topic		Replies	Views
Document Understanding: Document Splitting and Other Wonderful Stories :) Document Understanding	65	11450	January 15, 2022
How to use the IntelligentOCR Package Tutorials activities , bestpractices	128	20033	August 12, 2021
Does the machine learning api still work Document Understanding activities , document_understanding	8	1594	June 9, 2020
New UiPath Document Understanding features have been released! Document Understanding news , document_understanding	23	14828	March 12, 2021
AI Center \| Data Extraction Scope Error \| Custom ML Model AI Center question , ai_center	29	1743	April 20, 2022

Document Understanding: New Human-Robot Levels Available :)

Public OCR Contracts

How to Use This

Classification Station in Action Center

How to Use This

Extractor to Trainer Communication + Machine Learning Extractor Trainer

How to Use This

Tips and Tricks

… and We Keep Going!

Related topics