I’m retraining the Out-of-the-box invoice model on the company’s actual invoices (Norwegian).
I’ve deleted some “Regular fields” and added some.
The problem is that it is very time consuming to label documents (calculator says 1200 to give a good result).
In our invoice system/database we already have a lot of PDF invoices together with the right fields entered into the system. We could export this into a dataset (CSV, Excel).
The company’s dataset would contain the filename of the invoice’s PDF and all the fields and lines (Addresses, account number, invoice date etc.)
NEW FEATURE:
In Data Manager it would be nice if we had a “Predict all” button and “Import validation dataset” button.
Then all the predicted PDF’s could be validated by importing our company dataset.
This means each field in the PDF would be predicted (as today’s solution), then each field would be validated with the dataset instead of a human.
So if all the fields are correctly extracted, then the labeling is complete.
Only where the data does not match a human would need to label it.
This idea would make training models a lot faster for any kind a document.
There is a lot of data in companies data centers that could be imported if UiPath made this possible with import templates and batch labeling.
This could speed up model training extremely! Thanks.