Document Understanding: Document Splitting and Other Wonderful Stories :)

:fireworks: :tada: :sparkler: UiPath’s Document Understanding now has support for file splitting, custom ML models, better digitization and more!

The Intelligent OCR package (4.7.0-preview version) is out, and is ready to help you in even more complex use cases.

The Heros of this new version are a few new activities that allow you to work with files that contain multiple documents within them.

:loudspeaker: Intelligent Keyword Classifier

The Intelligent Keyword Classifier (and its companion, the Intelligent Keyword Classifier Trainer) are here to help you classify :tada: and split :tada: documents: if you now need to process a file that contains multiple documents inside, you can give the Intelligent Keyword Classifier a try!

How it works:

All you need to do is add it in a Classify Document Scope activity. Like this:

Don’t forget to Configure your Classifier for the doc types you are targeting for classification, like this:

It has a companion, the Intelligent Keyword Classifier Trainer (that goes into the Train Classifiers Scope) - this activity is used to help the Intelligent Keyword Classifier get better with each file you process!

But Don’t Panic, as the Hitchhiker’s Guide to the Galaxy recommends. You don’t actually need to run “the real deal” to get it trained. You can do this at design time as well, using the Manage Learning wizard. Like this:


That is, click on :books: Start Training (or the :pencil2: Edit icon for a doc type that already has training), select a few files that contain single samples of that document type (e.g., 3 documents each containing a single document of type X, not a document containing 3 samples of type X), and let it do it’s job. You will notice that the word vectors start appearing. :ghost:

This would not be too useful by itself, so we’re also publishing the …

:loudspeaker: Present Classification Station

With it, the Document Understanding Framework gets another :ice_cube: (that’s cool, yeah) feature. It is an attended activity that allows humans to review and correct automatic classification, split files into multiple document types, all in an awesome and very simple user interface. Like this:

How it Works:

  • you can view the document and scroll through it on the right side
  • you can view and edit page range splits and associated classes on the left side
  • you can move pages to adjacent classes by drag and dropping them
  • you can split a range of pages by clicking on the split option between any two pages
  • you can merge two ranges by using the “Move all Pages Up/Down” options (under the three dots, like this:
    image
  • you can scroll to any page by clicking on it
  • you can scroll to any document type by clicking on it.

Mind you, this is an OPTIONAL step: the output of the Classify Document Scope and the output of the Classification Station are of the same type. If you do want 100% accuracy though, we recommend you use it.

One associated change that you might want to implement in your processes is that related to handling ALL the doc types found within a file. For this, after performing a classification with splitting (using the Intelligent Keyword Classifier), after a human confirms the classifications, you can move forward with much more confidence into the Data Extraction phase… in a FOR EACH loop :slight_smile: (now you don’t only have one class, you potentially have multiple classes, right?)

You don’t have to worry about anything, as all Extractors will only get the page range they should be performing extraction on.

:loudspeaker: Machine Learning Extractor and AI Fabric - a :heart: story

The Machine Learning Extractor (from the UiPath.DocumentUnderstanding.ML.Activities pack) got a new configuration option, if you want to use it with an AI Fabric ML Skill. Like this:
image
The ML Skill dropdown will be populated with your Document Understanding Skills, if your robot is connected to a Cloud Orchestrator that has AI Fabric enabled (and, of course, has Document Understanding ML Skills).

AI Fabric is now in GA :sparkler: , and you can use it as infrastructure for managing your document understanding models. Available in our Cloud platform for enterprise accounts, AI Fabric can configure, train, host and serve Machine Learning models for Document Understanding. You can choose to start from our pre-trained Invoices or Receipts model, or with a blank-slate DU model that you can train (using data tagged using DataManager) on any fields of interest to your use case.

:loudspeaker: OCR Enhancements

  1. Starting with this release, you can use the Microsoft Azure Computer Vision OCR and the Google Cloud Vision OCR as engines for design-time training and template setup :hammer_and_wrench: in Document Understanding. Like this:

  2. Google Cloud Vision OCR now has another Input Argument, called DetectionMode :sparkler: . This is by default set to “TextDetection” (the current implementation), but you might want to try it on the “DocumentTextDetection” mode as well. In some use cases, and for specific languages, one or the other of these two options might perform better. The setting is… Like this:
    image

  3. In case you missed it, we are working on our own Document OCR engine, which comes with a companion activity in the UiPath.OCR.Activities package, in community preview.

Like this?

(see what I did there? :slight_smile: )
Please don’t forget to send us your feedback so we can improve these preview features and make them shine in your workflows!

The Document Understanding Team

27 Likes

So far, your work helped us so much. Even more fascinating features… thank you! :clap:

2 Likes

Hi @Ioana_Gligan,

I couldn’t find Google Cloud Vision OCR Detection Mode in UiPath.OCR.Activities v2.1.0 & UiPath.IntelligentOCR 4.7.0

image

image

Can you check this? Thanks

Hello @alexologica,

Please upgrade the UiAutomation package to the latest preview version.

1 Like

Hi @Ioana_Gligan,

as always great features and great update.

I wanted to ask whether there is a list of languages that are being handled by pre-trained Invoices models, that are being availible for us to use. I am working on my master thesist regarding Invoice classification and extraction in Polish and I wondered whether the models now also work with Polish language.

Thank you for your help,
Andrzej

Wow! @Ioana_Gligan and team! Great update. I am looking forward to trying this out. I love the addition of being able to use custom OCR activities.

Do we need to have a Document Understanding API key in order to use the Intelligent Keyword Classifier? Or can we keep that on our local PC without an API key?

2 Likes

Hello @oscar,

You need to use the cloud DU key , for tracking purposes only. No documents leave your premises, and no data about your processes.

2 Likes

Wow… awesome features!!!

I already tried out some of these lateat features and looks amazing… Exploring more and for sure will share the feedback… I always wanted to see more features in the DU package… and this is exactly what I wanted to see… Awesome work guys…

You guys are like magicians :tophat:

1 Like

Hello @ab83665 (Andrzej) ,

and Welcome to the Forum!

Polish is not on the list of supported languages AFAIK. It would be great if you would actually try it out and see if it gives any results…

A custom trained model would probably work best in your case…

Ioana

Thank you for answering, @Ioana_Gligan

I do agree that custom trained model might be my best best, as unfortunately invoces are not structured enough for Regex extraction. However, is it possible to use own models in Studio UiPath version?

Still I would like to try testing the pre-trained models first - is there any documentation regarding them availible to see what possible fields are there to extract or perhaps how the model was even built?

Thank you for your answer,
Andrzej

Hello, I’m a total beginner and want to test whether I’m able to use OCR for extracting data from orders. Now I wonder which OCR “program” I should use for this purpose. I’ve seen a video where apparently they used Google OCR. But I cannot find Google OCR in Studio nor in the “packages” for free download. I therefore installed UI Path OCR (Document and Screen) and was then told I need an ApiKey. I copied it but got the message “compiler fault” and in German: “Ausdrucksende erwartet”. No idea what that means. So, I resume that I cannot use this program. What should I do now to find an easy accessible easy to use free OCR program to test my abilties?

You can use Microsoft OCR or Tesserat OCR

You will use UI Automation package : 20.4.2 version

1 Like

Thank you! I found it.

Hello Friends,

If you want to see this in action, let’s meet online tomorrow!

(apologies for the last minute announcement :last_quarter_moon_with_face: )

Ioana

@ab83665,

All you need to do is just use the public pre-trained model with the endpoint https://invoices.uipath.com - with a Document Understanding ApiKey from the Cloud platform. And you can use the ML Extractor - it is open for community as long as you process at most 2 pages per document and at most 50 documents per hour.

Hope this helps,

Ioana

1 Like

Hi @Ioana_Gligan! Why is there no option of UiPath Screen OCR in form extractors? UiPath screen OCR works really well on images, native or even scanned…
Do we have only receipts and invoices in ml extractor for document understanding? How do we handle other documents using ML extractor?

Hi Ioana,

Is there a cost associated with using the Intelligent Keyword Classifier?

Thanks
Davendra

Hello Everyone,

I am new to document understanding and trying to understand the framework. I developed one XAML file to extract invoice# from amazon invoice using regex-based extractor but for some reason, it is not extracting even after trying multiple times.

Could anyone of you look at the attached XAML file and suggest any solution. The file should already have a sample invoice.

Thanks ,
Rishi

Document_Understanding.7z (82.4 KB)

Hi @shetanshudhar: You can use UiPath Document OCR with Form Extractor. UiPath Screen OCR is meant to be used for Screen Scraping tasks.

1 Like

Hello @davendra,

No cost associated with it as of now. ApiKey checks are being performed though, and some limitations might be enforced for community keys only. The definitive structure in which it will be officially published (now it is in preview) will be finalized in a couple of months.

ioana

1 Like