How to use and train custom ML model in Document Understanding

netri · November 19, 2020, 12:38pm

I am using Document Understanding features for my invoice processing user case.
I have 100+ different invoice formate for extracting invoice information like invoice number, Date Total amount, Supplier Name. Also I want to extract tabular format data that is present in the invoice.

When I tried to extract data using the default machine learning extractor It gives me an unexpected result. Sometimes it gives the expected result.

How to use my own custom Ml Model for extracting purposes using Document Understanding?
@Lahiru.Fernando @Palaniyappan

Lahiru.Fernando · November 20, 2020, 5:46pm

Hi @netri

The scenario is clear. Only thing I’m not clear is, when you say sometimes it gives you an unexpected result, what does it mean?
You mean that it gives you different values? or does it run into any errors?

If we take one invoice,

if you run the process couple of times, do you get the same output? or does it differ?

When dealing with multiple documents, for some invoices it might work perfectly, but for some it might not extract some fields etc… could happen. The way to optimize that is by training the model based on the training data that can be extracted using Train Extractor scope

Kesavaraj_K · November 21, 2020, 9:41am

Hi Everyone,

I am having the same doubt where all the templates get unexpected results

Example: Gets CUSTOMER CODE if configured to VENDOR CODE (India Invoices Enterprise Endpoint).

If it doesn’t get those details that’s fine! But getting other details is confusing.

netri · November 24, 2020, 9:26am

Since I have a lot of document types (which are not in a standard format) I don’t think the automatic extractor that UiPath provides will be able to extract the data I want. Is there a method where I can train the ML model by tagging the data I want to extract?

Also, I am unable to find any tutorials on how to use the “Train Extractor Scope”. Can you link some tutorial or documentation on that?

gtanjr · February 19, 2021, 6:02am

also need this one, how to train the model data? AI Fabric > Out of the box package > invoices > but when running pipeline it needs the dataset? where to get the dataset?

Lahiru.Fernando · February 22, 2021, 7:01am

Hi @gtanjr

You have to build the dataset using the Data Manager tool. Data Manager is a part of the insider program. You need to access it through insider and by requesting access

Arun_Singh · March 12, 2021, 5:29pm

Does anyone have answer for this. In my invoice extraction, the product which i have purchased is not captured correctly. As of now I’m showing the model which one is the product. How can I train my model and upload the same in ai center. I have used Train Extraction scope and machine learning extraction Trainer. Now how to upload the output of machine learning extraction trainer and train my model. @Lahiru.Fernando @Palaniyappan

Lahiru.Fernando · March 12, 2021, 5:35pm

Hi @Arun_Singh

Good question…
So, since you have generated the set of files needed for training through the Train Extractor Scope, the next task is to get the Data Manager up and running in your AI Center.

By default, it is not visible/ enabled in AI Center. For now, you have to get into Insider Program, and request access to Data Manager. The access request form will ask for some info, but it will guide you on where to find the information.

Once it is activated, you will see a new option in AI Center as Data Labeling. This is your Data Manager.

Next, zip the content you generated into a .zip file and upload it to DataSets in AI Center.
You can then use the Data Manager to retrieve the files from there, and train your models…

Andreas_Kurz · May 15, 2021, 2:00pm

Hi @Lahiru.Fernando
I am having issues with Data Manager. When trying to deploy a labeling session, it fails. There is no error description or log which explains why. After some time the status changes from “Deploying” to “Failed”.

However, in my process, im using validation station to validate (and optimize) the results. From what i understood from your reply it is possible to Zip the results from Validation Station (or Action Center) and upload them as Data Set to AI Fabric and create a Pipeline Run from there? The Labeling Session is therefore not necessarily required to retrain the model?

Do you have a video for retraining model like for the Document Understanding Process as well? Those were amazing resources.

Thanks for support!
Andreas

Topic		Replies	Views
Document understanding ML Something Else question	2	748	September 19, 2022
ML extractor trainer Document Understanding activities , question , document_understanding	2	567	June 22, 2023
Trainable ML model for invoice extraction - Pipeline failed AI Center question , ai_center	5	2383	May 5, 2021
How to train extractors with Document Understanding RPA Discussions machine-learning , general	2	1222	March 16, 2022
Not extracting proper invoice value from the document after giving training on 20 document using Datamanager AI Center question , ai_center	3	810	August 6, 2021

Most Active Users - Yesterday
ashokkarale
mohamed.saty2012
postwick
mkankatala
naveen.s
Islam_ISmail
rosamonde88
Preety_Choudhary
rkelchuri
Ahmad_Mubaied
More details...

How to use and train custom ML model in Document Understanding

Related Topics