UiPath Community 2023.10 Release - Document Understanding

Document Understanding

This topic goes in-depth about the improvements in Document Understanding. To read about other products, please navigate to the main topic here.

The 23.10 release was full of highlights for us: not only did we GA a new set of activities, which provide generative capabilities and make Document Understanding easy enough to be used by Business Users too, but also we have launched the ability to consume Document Understanding via Cloud APIs!

New set of Document Understanding Activities, developed for cross-platform projects

With our newest DocumentUnderstanding.Activities package for cross-platform projects, we’re bringing various capabilities to both RPA developers & business users - making document processing more accessible than ever!

Document Data :star:
In order to efficiently work with Documents, we introduced the notion of Document Data - an object which can be used as input or output to Document Understanding activities, containing all information about the document, depending on the activities one uses it with: document type (populated by the new Classify Document Activity), fields (populated by the Extract Document Data Activity), Text and Document Object Model (populated by the first Document Understanding Activity of the workflow, processing the input file - used by all other activities) and others. This object will contain all information one may require for the processed Document, all gathered into one resource - rather than spread upon multiple output objects. We encourage you to pass it over to all Document Understanding Activities, having it modified and populated by these - leading to increased performance by digitizing once (in the background) and reusing this forever.

Classify Documents with the pre-trained Classification Model :muscle:
We’re to be releasing the Classify Document Activity, which allows you to consume the ML Classification Model for determining the Document Type of a Document: provide the document as input and in the resulting Document Data, you can find details about the Document Type and Classification Confidence for it, which you can then use to select an appropriate Extractor.
Note that this version of the classifier only provides you support for the pre-trained classifier model - we will add support for custom classification models soon! Besides these, we’re also working on enabling splitting capabilities - which will populate the list of sub-documents of a Document Data - keep an eye out!

Extract Document Data with the pre-trained and custom extraction models
Extract Document Data provides users access to our out-of-the-box specialized extraction models, as well as custom models trained in Document Understanding in Automation Cloud - simply provide the document as input and select the corresponding extractor - everything else happens in the background, you just receive the corresponding Document Data!

Improved validation experience :face_with_monocle:
Besides the “Create Validation Task and Wait” Activity, with this release we also provide 2 other activities (similar to the ones available in the IntelligentOCR package", namely:

  • Create Validation Task (not suspending the workflow)
  • Wait for Validation Task and resume (suspending the workflow)
    These activities allow you to leverage other persistence activities in between: maybe you want to assign the new task? Or add a label to it? Having both, you can easily achieve this after creating the validation task!

Process PDFs in Studio Web :file_folder:
Our latest release brings with it the following activities, meant to process your PDF files in automations, allowing you to:

  • Extract PDF Text - read all text from a PDF file
  • Extract PDF Page Range - generate output PDFs for the specified page range
  • Merge PDFs - join multiple PDF files into a single one
  • Extract PDF Images - extract charts, graphics, logos and all sorts of images from a PDF file
  • Set PDF Password - remove or set a new password to a PDF file
  • Get PDF Page Count - retrieve the number of pages from a PDF

To the Classify Document & Extract Document Data activities, we’ve also added generative capabilities: processing unstructured documents? Trying to classify various document types, not supported by our OOB models? Try out our generative capabilities!

Extraction Automation Builder with predefined Document Types
Setting up a Document Understanding workflow can be overwhelming and complex - with Studio Web, things got easier, not only because of its user-friendly design but also due to activities which are simpler - doing more magic in the background for you.

Still, there is some setup required for processing documents: what extractor should be used? Is validation required? Show me the extracted results! To facilitate this and provide a smooth, easy onboarding experience which allows you to get up & running quickly, we provide the Extraction Automation Builder in Document Understanding. Read here more about how to get started with Document Understanding in a matter of minutes.

Document Understanding Cloud APIs

We’re happy to announce that now, you can consume Document Understanding not only via robots using RPA - but also via APIs hosted on the cloud :partly_sunny: They provide a means to consume all skills available (as pre-trained) or built (for custom Document Types via labelling sessions) in Document Understanding, enabling a runtime experience through various programming languages.

In this sense, we announce the launch of Document Understanding Cloud APIs , which will allow you to consume the framework the same way you would via RPA, providing:

  • Discovery APIs - allowing consumers to access the available resources (projects, document types, classifiers, extractors) used for the Document Understanding Framework, as displayed below:

  • Digitization APIs - providing a digitization method - called as a first step, responding with a documentId, which will be referenced by other operations; and a method for retrieving the corresponding result, if required!
  • Classification APIs - allowing you to consume classification models for identifying the Document Type of the input document (similar to the Machine Learning Classifier enables classification via RPA)
  • Extraction APIs - allowing you to consume extraction models, for retrieving the fields of the Document Type processed by the extractor (similar to the Machine Learning Extractor provides this capability via RPA)
  • Validation APIs - allowing you to create Validation Tasks in Action Center, leveraging both the Classification or the Validation Station depending on users’ needs.

Classification & Extraction APIs are available for both synchronous (for documents up to 5 pages) as well as asynchronous (posting the request via a start method and retrieving the result via polling) consumption, to provide support for various use cases: be it optimizing for performance or processing of large documents.

The service is discoverable via a Swagger interface which can be accessed from Document Understanding in Automation Cloud.
Find here more details about :slight_smile:

We have worked very hard to deliver all capabilities & hope they come in handy to you - looking forward to your feedback! :pray:

1 Like

Amazing results with the new ai classification and extraction!
It would be really useful to use variables in the prompts. This would allow custom prompts for users and would allow my workflows to become commercial. Looking forward to the updates!

@THodgson I’m happy to report we’ll be supporting variable inputs to the prompt with the next release - keep an eye out for updates! :dancer: