Has Anyone Implemented GenAI-Based Document Understanding in a Real-Time Project?

Hi Community,

I wanted to open a discussion around the recently launched GenAI-based Document Understanding by UiPath.

I did a small round of testing with it, and the results were pretty impressive in terms of extraction accuracy and flexibility. Currently, in our project, we are using the traditional Document Understanding approach, which includes ML Classifier and ML Extractors.

We’re also leveraging Action Center for human validation and exception handling.

I’m now considering whether it’s the right time to transition to GenAI DU, but before taking that route, I wanted to hear from others:

:backhand_index_pointing_right: Has anyone implemented GenAI DU in a real-time or production project?

  • How has your experience been?
  • Any limitations or challenges you faced during implementation?
  • How does it integrate with existing workflows, especially those using Action Center?

Would love to hear your thoughts, best practices, or anything to keep in mind while making the switch.

Thanks in advance!

Hi,

I hope you are talking about the UiPath Intelligent Document Processing (IXP). It is a completely prompt-based extraction. It doesn’t take much time like the traditional ML Models, as the only time it takes is to create the prompts and the taxonomy while creating the model. It takes less time in creating the model and you do not need to have 40 to 50 documents for training. This model best suits when you have less samples. This also extracts table data based on the prompt that you provide.

Some of the limitations are,

  • Whenever there is a wrong extraction, you cannot annotate it. The model can be corrected only through prompts.
  • Some admin rights are required for creating the project. Only developer access is not sufficient to create or make any updates in the project.
  • You cannot do classification with IXP. Only extraction can be done.

Integrating with existing workflows and the Action Center is straight forward. If you are using DU Framework, then the files will come into Action center through the same activities that were available previously and logic that you’ve implemented. There is a new activity for extraction through IXP - Extract Document Data available in UiPath.DocumentUnderstanding.Activities.

Hope this answers your questions.

Cheers!

So, there could be two areas you refer to.
The first in the Generative AI Extraction from the Document Understanding package, the other is using an AI Agent for Document Understanding.

Both have some advantages and drawbacks.
The one in the Document Understanding package is good, but its unable to handle tabular data at this point (I can be corrected on this, I remember them releasing something on ‘complex documents’ but I feel like tables still weren’t available).
Its also annoying to update since the prompts are held in the package and so you need to push new versions to push updates to prompts.
You need to make sure you build a robust testing framework around the extractor to test your prompt improvements.

The one in the agent builder has alot more flexibility than the one in the Document Understanding Center as you can represent tables and even more complex data, and you can set up evaluations in there to have a good way to test your changes and discreetly publish different versions.
The biggest downside here is that it currently doesn’t integrate into the Action Center at all without a custom built form if you want a human to validate the output of the agent, which is something the above method can do.

Incorrect, Generative Classification has been around for quite a while now.
You can see it in the documentation here.

Hi @Jon_Smith,

I was referring to IXP does not have the capability to classify the documents at the moment. And yes, agree that there is an activity called Generative Classifier which classifies documents based on the prompts which can be set up inside the project directly. However, like ML Classifier which works based on the model, we do not have a classifier that’s coming from the IXP model.

Thanks!!

What IXP model are you referring to?

Document Understanding - Document Understanding™ migration to UiPath® IXP

1 Like

Hmm, I’m not sure I’d call that a model as in this context that term has a different meaning.
The new layout for IXP doesn’t stop you using any of the generative classification methods I showed before and IXP is an umbrella term that UiPath is using that includes all the existing IDP methodology so I stand by my statement that what you said is incorrect or can easily cause confusion as you can do it.

Hello Everyone,

Thank you very much for your response.

I am looking forward to learning more about the following:

Has anyone implemented GenAI DU / IXP in a real-time or production project?

  • How has your experience been?
  • Were there any limitations or challenges you faced during implementation?
  • How is the AI Units Consumption?

I have done a small POC on IXP, and the model performs both classification and data extraction using the Classify Document and Extract Document Data activities.

Since this model is relatively new, I wanted to know if anyone has implemented it and could share their experience before I proceed with using it in a real-time project.

UiPath IXP

Generative Classifier and Generative Extractor using UiPath.DocumentUnderstanding.Activities package