Document Understanding - Extract Data

Mohammed_Shahid01 · May 18, 2026, 6:28am

Hi Team,

I’m working on a use case where we are extracting data from invoices for multiple vendors. Currently, we have implemented this using Modern DU and have trained it for 4 vendor formats so far.

However, for every new vendor, additional effort and time is going into training and improving extraction accuracy. Since all of them are invoices and most of the fields are fairly standard, I wanted to understand if this is the expected approach or if I might be missing a better design/practice.

We are expecting 50+ vendors in total, so I would appreciate suggestions on the best scalable approach for this scenario.

Yoichi · May 18, 2026, 6:34am

Hi,

How about using Helix extractor, if you have never tried it?

Regards,

JarrydScott · May 18, 2026, 6:39am

Welcome to the community @Mohammed_Shahid01

For Document Understanding processes like this, it’s a good idea to implement a classification station in the process.

The classification station lets you pass documents through to human intervention if the “new” invoice did not meet the confidence score that you set. For example, let’s say you trained invoices on 10 different vendors and you have set a confidence level of 90% and a completely new vendor invoice comes in, IXP “should” understand that it’s still an invoice and let is through, but lets say it doesn’t, it won’t meet the 90% confidence level and immediately default to the classification station (Human-in-the-loop) in Action Center for manual classification.

What happens at that point is that a human will manually go into Action Center and classify it as an Invoice, and once complete, it will automatically train into your existing invoice model as a new vendor in (almost) real time. Next time around, if that same new vendor comes in, the confidence will now increase. Repeat this process.

After let’s say 6 months, you should theoretically be able to disable the classification station because it will have trained automatically on all those new manually classified documents from Classification Station. But if the client consistently get’s new vendors, then just leave it in.

That’s my personal opinion. Just implemented it recently for two clients, works like a charm

MohammedShabbir · May 18, 2026, 6:58am

Hi @Mohammed_Shahid01

Since most of the documents are invoices with largely standard fields, you may not need to train a separate Modern DU model for every vendor format.

You can consider using:

Out-of-the-box Invoice ML models
Public endpoints (https://du.uipath.com/ie/invoices )
Intelligent Xtraction Processing (IXP)
Agentic extraction approaches for semi-structured invoices

UiPath already provides pre-trained models which can generalize across multiple vendor layouts:

[UiPath DU Receipt Extraction Quickstart])

With this approach:

onboarding new vendors becomes faster
less vendor-specific training is required
validation can still be handled through Validation Station or Action Center
only edge cases may require additional improvement/training

For 50+ vendors, maintaining individual training cycles for every layout may become difficult operationally. Using IXP/Agent + pre-trained models first, and then selectively training only low-confidence vendors, could be a more scalable approach.

Monali_Vekariya · May 18, 2026, 9:00am

Hi @Mohammed_Shahid01

What you’re doing is normal, but training a separate model for every vendor is not a scalable approach. Since all documents are invoices, it’s better to use one common Invoice model and handle most fields in a generic way. Then you can improve accuracy by adding a few samples, using regex or rules for specific fields, and using validation for edge cases. Only go for separate models if a vendor’s format is completely different. This way you avoid maintaining 50 models and keep the solution much simpler and scalable.

Topic		Replies	Views
Document understanding and Model training for 100+ multiformat invoices Document Understanding ai_center	1	47	November 17, 2025
How to extract multiformat/multi vendor invoice data using document understanding Something Else feedback	2	248	November 17, 2025
How to use and train custom ML model in Document Understanding Help activities , question , document_understanding	8	3504	May 15, 2021
How to extract required information's from different type of PDF invoices? Activities ocr , activities , question	3	947	July 8, 2021
Multiple invoices with ML Extractor Document Understanding question , document_understanding	2	977	October 9, 2020

Document Understanding - Extract Data

Related topics