UiPath Community 23.1 Preview Release - Document Understanding

We’re happy to report on our latest work for the Document Understanding product encompassing updates in the area of both classification & extraction :dancer:t2:

Definition of a default value for a Field :writing_hand:
Have you ever wished for a fallback value to be populated in the Extraction Result, when a field could not be extracted? Have you ever wondered “Why manually inputting the same value, if it comes empty in documents?” - wonder no more! :thinking: With the latest release, one is able to provide a “Default value” for fields, which will be populated in the Extraction Result in case no other value for the field has been found in the document. In this sense, one does not need to repeatedly select “False” when no checkbox is checked or provide a default value to not leave an input empty, because nothing else was extracted - it just works!

OCR updates :books:
We have migrated the Omnipage OCR to .Net5 portable, so that you can now use it within Linux robots :robot:

Improved Classification Experience using the Intelligent Keyword Classifier :face_with_monocle:
We are happy to report that we have improved the splitting algorithm: now the algorithm can take page numbers into consideration and does a better job at identifying where documents start and end. For example, it looks for “Page 1” or “3/3” or “Page 3 out of 3” to identify the starting and ending of a document, resulting in more accurate splitting.
And in case you do not want to use the splitting algorithm provided with the Intelligent Keyword Classifier, you know have the option to disable it. Until now, the algorithm would split documents even if if splitting wasn’t necessary. Now, the splitting feature can be disabled using a checkbox option.
Finally, we have also improved the splitting algorithm to better split documents of the same type within a file - shall you not notice our improvements, please reach out - we’re happy to help out!

Reporting of the Text Type in the Extraction Result :printer::memo:
Text can come in documents either as handwritten or printed, checkboxes or other elements. With the latest release, the Extraction Result also makes this information available for you to consume, enabling the use case in which handwritten documents are sent for validation or checkboxes are further collected & processed.


Waited for this one…will check it out. Can this similar feature available for ML classifier and One Click Classification?

It would be really nice if default value able to accept expression. Taking in count Invoice Due Date. If Due Date is empty, set Today + 10 Days.


thanks for your feedback @rikulsilva - will add your feedback to our backlog :slight_smile:

Hi Monica,

Those look like really good additions to the DU capability.

Can I please check if the enhacements will include the capability to split/ extract multiple invoices from same/different Vendors in a single PDF. Also if there are multiple invoices on one page can they be split/extracted.


@Murtuza_Kapadia yes, so ideally your use case of processing a PDF having multiple invoices from multiple vendors should now be supported, IKC enabling you the split of it so that you can iterate over each invoice and extract data from it - however, multiple invoices on the same page are not detected (maybe share with us some sample docs if you can?) - we hope you will give your use case a try & let us know how it works :slight_smile:

This is great new, Monica. Am I able to use the feature now? How can I use it? Is it automatic?

Hey @Think_Blue_Management !
Not sure I understand you, what feature do you mean (I have listed multiple)? if you refer to the “definition of the default value” then yes, it will work automatically - you define it in the taxonomy manager, the algorithm populates the extraction result automatically with it.

Let me know if I didn’t answer,

Hi Monica, my apologies, I was referring to the Improved Classification Experience using the Intelligent Keyword Classifier ability to extract invoices that spans multiple pages (eg 1-3)