My ML Skill also fails to recognize the correct date format for a particular type of date while succeeds to all other dates

Background:
I have to extract the correct date information from the bill with using the document understanding in AI Center. I have created an ML Skills and upload around 50 documents into the dataset for pipeline trainings. After about 15 times of training, I can get the satisfied results for most samples.

However, some samples are still failed to get the correct result as the month and day values are mismatched. Here is the example:

The expected Bill Date should be 2021-08-07. For the Due Date of the same document, it can output the right value.

After more experiments, it is found that if the month and day values of Bill Date are interchangeable, i.e. 01-12, the failure rate is higher.

In order to overcome this issue, what should I do to improve the accuracy of model? Is it useful if I retrain the ML Skill with the same dataset for multiple times, e.g. 10 times?

As I am new to the document understanding and AI Center, please share your experience with me if you have any idea to this topic. Thank you very much!

Hey!

Which extractor we’re using to extract the data?

It’s better to use Intelligent OCR…

Regards,
NaNi

I am using the UiPath Document OCR for digitization

Hi there, this seems like the due date and bill dates are getting mixed, since you’re already getting some documents properly extracted, we can resolve this error with the following steps.

  1. check your ML model, make sure you have labelled all dates properly and see if you have accidentally label more dates.
  2. Add more training data as your model size is small.
  3. add an evaluation pipeline to check the ML model quality.

Over time with more training data you can distinguish both dates as due date and bill date.