Automatically finetune templates for data extraction

Hi all,

I am working on a process that extracts data from invoices using both document understanding and AI Center.

I have several templates of invoices (a dozen of them) that I prepare for extraction by using the form extractor and machine learning extractor (during data extraction scope step).

The thing is that I will soon have more invoice templates to add (around 150) and I don’t want to prepare each template using the form extractor (so basically I don’t want to do each of them manually) because it is going to be too long.

Is there a way to automatically finetune these extractions?

Hello Zim,

Are invoices the only document type you’re extracting? If so you can extract using an ML model and omit the forms extractor as its going to be tedious to label with your given volume.

From my experience the out of the box document understanding invoices model is an excellent choice for invoices and I’ve seen great results.

on AI Center go to ML Packages>out of the box packages > UiPath document understanding > Invoices. This is going to be your ML package.

Once you add that as your ML package, you need to start labelling. If you have around 150 invoice types you can label around 3-4 samples per invoice type (total around 450-600) once you’ve done that you can train the pipeline and deploy your skill.

Fine-tuning should be done once you deploy your ML model. You can run an evaluation pipeline in your pipeline training to see how well the fields are getting extracted. If there are skews you can always retrain to get better results.

You can also add the machine learning trainer scope. This is if your document have action center validation, they can be added to the training lifecycle of your automation.

Overall, it will be a great start for you to develop with the ML invoices model only.

All the very best!

Hi,

First, I would like to thank you for your reply.

Then, I just wanted to add that I am already working with AI Center and out of the box document understanding model with invoices ML package. I actually have both in my process (machine learning extractor and form extractor) because the robot will take the best result between the 2. Having extractor form is a kind of security.

So, the only solution for 150 different type of invoices would be to give up the form extractor and focus only on using an ML model?

Correct. the ML model outperforms the forms extractor when you have high diverse scenarios. Also forms extractor is developed to more suite structured documents with no variability.

Hence you should opt to move with only an invoices model using the out of the box invoices ML package.