Document Understanding - November Updates

Greetings, our dear Community!

This month, the UiPath.IntelligentOCR.Activities package has gotten some small, but pretty impactful changes, especially under the hood.

Here’s a summary of what we’ve been working on:

:bug: Eradicated Some Bugs :bug:

We’ve fixed 15+ small issues related to Validation Station design, serializable Document Understanding objects for Long Running Workflows (persistence projects), and some attended activities intermittent errors (you know, the annoying not-always-reproducible ones :ghost: :fog: ).

:abc: Improved Speed of Use of Validation Station :abcd:

We’ve added a usability and speed of use improvement to Validation Station: starting with this version, you will have a small icon in the Document side, that allows you to specify how the Area Selection tool functions:

  • always select the area
  • always select the tokens (aka words)
  • ask you every time (as it did until now).

The default setting will be “always select the tokens”, as this is the most common usage of document selections during data extraction validation.

Here is how this option looks like, and where to find it:
image

:one: Exposed OCR Confidence in Extraction Results Export :two: :100:

We’ve added a new awesome advanced option in the Export Extraction Results activity, called Include OCR Confidence. Yes, you guessed it right, it’s…


right here!

If you use it, for each field or column in a table, you will get one extra value, after the Confidence (aka Extraction Confidence), and you will be able to see and act upon the OCR confidence of a given reported value without needing to iterate through the ExtractionResults object anymore.

As always, we strongly recommend that, to learn the structure of the contents of your output dataset, you first iterate through the tables and write your DataTable data into Excel sheets. This way you can identify exactly what information is being reported, and what the names of the columns are! (do tick the “Include Headers” option :wink: )

:twisted_rightwards_arrows: Optimized how Document Classification works :dromedary_camel: :camel:

We’ve also changed the way the Classify Document Scope works. This might get a bit technical, so put on your seatbelts…

When there are multiple classifiers in the Classify Document Scope, the way they are now handled is:

  • results that are above the minimum confidence threshold, for their respective page ranges, are kept from the first Classifier
  • the second Classifier is called multiple times if necessary, each time with one of the remaining, unclassified, page ranges. Results are being kept if they are above the minimum confidence threshold for that Classifier
  • … the rest of the classifiers, act just like the second one, for smaller and smaller pieces of the original file.

Of course, this doesn’t change the fact that you should put your “most precise” classifier first (even though it might be pretty rigid), and order your classifiers based on this: how accurate and how restrictive they are (most accurate and restrictive first, least accurate and “broad” last).

:ab: Upgraded UiPath.Abbyy.Activities Package :ab:

The latest preview version of the Abbyy FlexiCapture activities (FlexiCapture Classifier and FlexiCapture Extractor) are now upgraded and fully compatible with the latest Abbyy FlexiCapture Engines.

Starting with this preview release, we will officially support the new 12.1.0.24 version of the engine as well.

:see_no_evil: :hear_no_evil: :speak_no_evil: Focused on Making it Better for You!

As you’ve probably noticed, we have focused on rounding up and stabilizing the Document Understanding Framework - and we will continue to do so in the next few months.

To ensure that we are capturing all your issues and desires, please do send us feedback on how we’re doing, what you like and don’t like, and what you’d like to see added to the framework!

See you soon,

The Document Understanding Team

7 Likes

Thank you so much for adding such great features!!! :tada:

I love the idea of including the OCR Confidence along with the previous confidence field. This will help better decisions from the robot to check whether we need manual verification or not (as the initial check)…

Also, I tried out the new machine learning classifier (in preview) to classify documents using AI Fabric… It was super awesome!!!

For anyone interested in ML classifier, here’s the link… :smiley:

5 Likes