How many invoice is suitable for one ML Skill?

Jaemoon · October 26, 2021, 10:58am

we are using Document understanding and Google vision API as OCR Engine.
and we created ML Skill which contained over 5500 invoice file with multiple language and different kind of template.
problem is data extraction accuracy is not high. and whenever we had more invoice training with different kind of invoice, accuracy seems going down. what i mean, some data has been extracted correctly before. but after new invoice training, the data will not be extracted correctly.
I feel much more invoice training cause less data extraction accuracy.
someone say that too much invoice training make this problem. so volume of ML Skill need to be divided by language or template. but i am not sure it’s best solution.
please advice me how can we keep high data extract accuracy with suitable ML Skill.

rahulsharma · October 26, 2021, 2:35pm

yes, so that the algorithm learns that pattern for one language at a time
If there are multiple formats, make sure you create custom models to be able to handle that. below video shows the steps:

Building & Training Custom ML Models for Document Processing | RPA | UiPath - YouTube

Regarding the number of items required to be trained on, that depends on case to case basis. but for one format, I recommend at least 7-10 invoices to make the algorithm learn

sharon.palawandram · October 26, 2021, 3:56pm

given the large invoice files, 100 samples per language is a good sample size.

The model will be more improved if there are different ML skills/ models for different languages as the model is trained by recognizing characters. you can either classify or deploy new model.

do try with 100 samples first, and maybe increase the size and see if the accuracy increases, however there will be a max threshold at one point.

Topic		Replies	Views
Training is not working in Invoice Extraction AI Center question , ai_center	2	1502	August 9, 2021
HOW MANY MINIMUM SAMPLE DOCUMENTS WE SHOULD USE IN ORDER TO TRAIN ML SKILL AI Center question , ai_center	2	1333	March 8, 2022
Low accuracy of results - Document Undestanding Document Understanding question , document_understanding	5	1585	September 1, 2020
Automatically finetune templates for data extraction AI Center question , ai_center	3	738	March 1, 2023
Invoice Extraction using trainable machine learning model AI Center	2	1234	October 22, 2021

How many invoice is suitable for one ML Skill?

Related topics