I am developing an INvoice Extraction Process and Invoice Out-of-the-box package on AICenter as ML skill. I have 17 invoices to extract data but the extraction is failing in some fields in some Documents randomly.
Then I tried training the ML skill by uploading those 17 invoices on UiPath Data Manager and then export it as a dataset and then train pipeline to create a new version. When I am using that new version, it is writing in some previous fields which were missing before but making new errors. I am expecting 100% success in data extraction. Please let me know if there is any other way.
First of all, you will never have 100% success if your invoices aren’t built exactly the same, and even then it’s highly unlikely.
Also, 17 invoices for training a model is not a lot, some might even say it’s definitely not enough. It depends on your invoices and how similar they are.
For successfully running Training or Full pipelines we strongly recommend at least 25 documents and at least 10 samples from each labelled field in your dataset. Otherwise the pipeline will show an error “Dataset Creation Failed”
I recommend using the Document Understanding Template for your automations. That way you have a process where you can continually retrain your model based on documents that weren’t extracted correctly.
If this has helped you, please mark this answer as the solution.
I am using exact same invoices which i used in datasets but still it is giving about 85% accuracy.