RPA Framework for Document Understanding

Hello everyone!

Our new RPA Framework for Document Understanding processes is now available for preview and review. :slight_smile:

Key features:

  • Easy to get new Document Understanding projects started; usable in all cases - from small processes to complex solutions
  • Easy to integrate into larger automation flows
  • Production-ready; built-in logging, exception handling and retry mechanisms
  • Common architecture for both Attended and Unattended (+ Action Center) implementations
  • Meant to make development, deployment, debugging and scaling much easier.

I’ve created this 15-min video to give an overview of the solution and help anyone interested to get started.

We encourage you to try it out and let us know what you think. We love getting your feedback - it is key for improving our solutions.

The solution is available to be downloaded as a ZIP archive or a nuget package usable as a studio template.

62 Likes

Very good base without doubt for processes with Document Understanding, good job! :))

1 Like

Awesome one @Alexandru-Luca.

Question: Even though you have a multipage document, containing several document types within - the explicit ‘splitting’ of the document into separate files using extract pdf range, is still optional, right? Because you can still extract each classification result separately without explicitly splitting into multiple files. So that you wont have to perform redigitization.

2 Likes

This is really great news!
Keep up the good work!

@zell12 exactly.

In fact, for this workflow it even might be useful to combine several PDFs into one.

Imagin a scenario where you have to extract data out of email attachments. Lots of times there’s not only the attachment you need, but also others that are unnecessary. In order to know which one is useful and which isn’t you need to digitize and analyse them.

As this framework is using one job per input file, the easiest solution is to combine your attachements into a single PDF in the dispatcher (dispatcher reads mail, combines attachments, starts DU Job).

The framework then does the whole job of finding out which documents (pages) are useful (classification) and extract the needed data.

@Alexandru-Luca

very good job! I really like it!

2 Likes

Excellent Work @Alexandru-Luca!! Thanks to you and the whole UiPath team!

1 Like

Hi …@Alexandru-Luca - I am not able to add the nupkg …am i missing something?

image

Hi @prasath17
You should add the npkg in the folder that your setup is searching for templates.

  1. Where the npkg should be saved
    template

  2. Click on more templates

  1. Here it is! :stuck_out_tongue:

5 Likes

wow :beers:

Does it works with the current REFramework ?

There are teams which works with only REFramework.

Thank you @Alexandru-Luca This is really great

Thank you, everyone, for your feedback! It is very much appreciated, please keep it coming! :tophat:

@zell12 - indeed, document splitting is optional. In fact, most processing steps are optional and should only be used if needed. As for the splitting - it’s more of a UX improvement; a bit of a workaround to allow the person doing the data validation to only view the page-range they should be checking. It should make for a better user-experience and make validation a bit less prone to user-error. This optional step will go away once page-range support is implemented in the Validation Station/Validation Action.

@mmcruzRPA Thank you for explaining how to set up the template

@wagner - the new framework is meant to be used only for implementing Document Understanding processes (and only such processes). It should be used in conjunction with the RE-Framework, for any automation process coming before or after the Document Understanding part.

3 Likes

Hey Alex,

Can you please offer a practical example on how DU framework would be used in conjunction with queues?

As quick example Dispatcher puts files in Q1, Q1 would have a trigger to start DU for each item uploaded by the Dispatcher - when DU finished it creates a Q item in Q2, waiting for the performer to pick it up.

Upsides for using queues are too many to mention … downside is that the Q1 items would be abandoned if not validated by HiL in 24hrs (if that’s the case), any workaround for this? considering I want queues end to end.

Hi @ctutu!

Agreed, using queues definitely has huge advantages and we all prefer to use that approach :slight_smile:

The bad news: the only workaround at the moment, would be to change an Orchestrator config setting that dictates how much time passes before a queue item becomes abandoned. We recommend not to use this approach however: this setting affects all queues and the potential for running into undesirable problems is quite high. Or you can allow queue items to become abandoned for a while but this is also undesirable.

The good news: the Orchestrator product team is working on bringing long-running support to the queues. When this feature is introduced, the framework will be updated to support queues, as well.

Cheers,
Alex.

1 Like

Could you help the English Subtitle :frowning: ?
Please ~~~ :cry: :cry: :cry: :sob: :sob: :sob: :sob:

Hi @Chen_Kenny!

Unfortunately I don’t have any subtitles prepared :frowning: But please feel free to download the video itself and use an app that auto-generates captions.

Hi @Alexandru-Luca,

What do you mean by Robot license as unattended robot.
Has both access in Modern Folder, etc.? I’ve tried creating the Dispatcher as show on your tutorial but Ive getting errors on the ActionCenter part - Always job faulted. Can you show us the set-up on your ochestrator?

Hi @_maan!

The most common issue is caused by the robot not having proper access rights to the storage bucket or to the stored filed.

Can you please share what error(s) you are facing? A screenshot and a stack trace would be most helpful.

An FYI for everyone: an update has been published for the template - v1.0.2. The links in the initial announcement message point to that version now.

2 Likes