UiPath Community 2023.4 Stable Release - Document Understanding

Monica_Secelean · March 29, 2023, 10:00pm

This topic goes in-depth about the improvements in Document Understanding. To read about other products, please navigate to the main topic here

In the new release, we focused on improving the accessibility of Document Understanding and PDF Activities in Studio, as well as the experience of the Field Level Rule feature (details on configuring them here and on the experience using them in the Validation Station here), after receiving your feedback

Right-to-left Language support

We have added support for right-to-left languages like Arabic, Hebrew, and Persian. This feature provides improved accuracy and efficiency in data extraction, streamlining document processing workflows for users who work with right-to-left languages.

Updates on Business Rules

Mathematical Formula Field Rules ∑

With this release, we have added a new rule type that allows for the definition of mathematical formulas for both simple fields or column fields of type Number, referencing other number fields or number values. In this sense, one can provide one or multiple of the following:

Field: either of the below:

a simple field of type Number
a column field of type Number
or a fixed value (provided by the user)

Mathematical Operator: +, *, -
Grouping Operator: (,)

All these to model use cases like:

Total > 100
Total = Subtotal + Delivery – Discount
Line Amount = Unit Price * Qty (all 3 being column fields, rule applicable for each row of the table)
Total Discount = sum (Discount Value)
Total Price = sum (Unit Price * Qty)
Total Price = sum(Line Amount) + Tax - Total Discount
And many more shall we be missing anything, do shout out – and keep watching, more rules one their way

Automatically applied rules in the Validation Station

Remember the Field Level Business Rules feature we previewed some while back? Where one would check the extraction against certain pre-defined rules, in the Validation Station? Until now, the rules have been verified when submitting the validation session – however, with this new release, they will automatically be applied, so that one can see the results quickly, reducing like so the time one spends validating the documents.

Enhanced the Forms Extractor page-matching algorithm

For the Form Extractor to correctly extract the data from a document, until now, the document pages needed to be in the order in which the Template has been configured – with this new release, we have enhanced the algorithm and are using the “page matching info” to identify the page and match the result of it to the page of the document received as input to the activity. In this way, we rely on exact matching info, instead of a page order when identifying and extracting the data, leading to an improved extraction result – even for scanned documents for which the pages do not respect a particular order.

Dataset Size Calculator in Semi-structured AI Document Types

This is a new functionality for Dataset Diagnostics which can be accessed by clicking on the Dataset Health indicator in the top bar of the Document Manager as indicated below.

There is a new tab called Calculator on the Dataset Diagnostics dialog. On this tab one can see an up to date estimation of dataset size required for a given Document Type. The numbers of fields of all 3 types are automatically populated based on the schema in the Document Type and on the Out-of-the-Box Document Type selected in the top left dropdown.

Note that you must select the number of Layouts yourself from the bottom right dropdown.

Benefits : Allows users to adjust the Out-of-the-box Document Type they want to train on, as well as the number of Languages or Layouts, and see how that impacts the size of the dataset required for a high performing Extractor.

Justinas_Kazanavicius · March 30, 2023, 4:10pm

Is there a way to access this on the Enterprise version? The DocumentUnderstanding package only allows to use 23.2.1 as the latest, and if I choose DocumentUnderstandingPreview I only see 22.6.1 version which I assume is something much older.
Does this now use LayoutLMv3?
Any reason not to include Precision and Recall as metrics? F1 is just a harmonic mean of both of Precision and Recall and providing these metrics individually would provide a better understanding of model performance.

Topic		Replies	Views
UiPath Community 23.2 Preview Release - Document Understanding Product News document_understanding	10	2105	March 22, 2023
Document Understanding 2021.4 - or Why We've Been so Quiet Product News document_understanding	17	5755	August 11, 2021
Document Understanding - 2022.5 Community Preview Document Understanding document_understanding , document_processing	8	2412	February 1, 2023
Document Understanding - November Updates Document Understanding	4	1426	December 8, 2020
UiPath Community 2024.4 Release - Document Understanding Activities Product News document_understanding , document_processing	2	1222	May 9, 2024

Most Active Users - Yesterday
ashokkarale
prashant1603765
sharazkm32
V_Roboto_V
sonaliaggarwal47
Ranveer_S_Thakur
Aki1111
arivu96
chaitanyaKumar
manasrlenka25
More details...