How to use and train custom ML model in Document Understanding

I am using Document Understanding features for my invoice processing user case.
I have 100+ different invoice formate for extracting invoice information like invoice number, Date Total amount, Supplier Name. Also I want to extract tabular format data that is present in the invoice.

When I tried to extract data using the default machine learning extractor It gives me an unexpected result. Sometimes it gives the expected result.

How to use my own custom Ml Model for extracting purposes using Document Understanding?
@Lahiru.Fernando @Palaniyappan

1 Like

Hi @netri

The scenario is clear. Only thing I’m not clear is, when you say sometimes it gives you an unexpected result, what does it mean?
You mean that it gives you different values? or does it run into any errors?

If we take one invoice,

  • if you run the process couple of times, do you get the same output? or does it differ?

When dealing with multiple documents, for some invoices it might work perfectly, but for some it might not extract some fields etc… could happen. The way to optimize that is by training the model based on the training data that can be extracted using Train Extractor scope

Hi Everyone,

I am having the same doubt where all the templates get unexpected results

Example: Gets CUSTOMER CODE if configured to VENDOR CODE (India Invoices Enterprise Endpoint).

If it doesn’t get those details that’s fine! But getting other details is confusing.

Since I have a lot of document types (which are not in a standard format) I don’t think the automatic extractor that UiPath provides will be able to extract the data I want. Is there a method where I can train the ML model by tagging the data I want to extract?

Also, I am unable to find any tutorials on how to use the “Train Extractor Scope”. Can you link some tutorial or documentation on that?

1 Like

also need this one, how to train the model data? AI Fabric > Out of the box package > invoices > but when running pipeline it needs the dataset? where to get the dataset?

Hi @gtanjr

You have to build the dataset using the Data Manager tool. Data Manager is a part of the insider program. You need to access it through insider and by requesting access

Does anyone have answer for this. In my invoice extraction, the product which i have purchased is not captured correctly. As of now I’m showing the model which one is the product. How can I train my model and upload the same in ai center. I have used Train Extraction scope and machine learning extraction Trainer. Now how to upload the output of machine learning extraction trainer and train my model. @Lahiru.Fernando @Palaniyappan

Hi @Arun_Singh

Good question…
So, since you have generated the set of files needed for training through the Train Extractor Scope, the next task is to get the Data Manager up and running in your AI Center.

By default, it is not visible/ enabled in AI Center. For now, you have to get into Insider Program, and request access to Data Manager. The access request form will ask for some info, but it will guide you on where to find the information.

Once it is activated, you will see a new option in AI Center as Data Labeling. This is your Data Manager.

Next, zip the content you generated into a .zip file and upload it to DataSets in AI Center.
You can then use the Data Manager to retrieve the files from there, and train your models…

Hi @Lahiru.Fernando
I am having issues with Data Manager. When trying to deploy a labeling session, it fails. There is no error description or log which explains why. After some time the status changes from “Deploying” to “Failed”.

However, in my process, im using validation station to validate (and optimize) the results. From what i understood from your reply it is possible to Zip the results from Validation Station (or Action Center) and upload them as Data Set to AI Fabric and create a Pipeline Run from there? The Labeling Session is therefore not necessarily required to retrain the model?

Do you have a video for retraining model like for the Document Understanding Process as well? Those were amazing resources.

Thanks for support!
Andreas