Low accuracy of results - Document Undestanding

Hello team,

I’m facing some issues while using Document Understanding cloud version in AI Fabric. Basically, the final results i’m getting aren’t precise enough to be considered acceptable, even though i’m running a fairly simple and basic test.

I intended to build a ML Model on top of the out-of-the-box Document Understanding template and consume it from a RPA flow in UiPath Studio. I’ve followed documented guides to achieve this, so i will explain the steps i took.

  1. I requested an Enterprise Trial version to get access to AI Fabric in Automation Cloud
  2. I downloaded, installed and configuring Data Manager and OCR engine locally.
  3. I gathered 5 very-structured invoice documents, which had exactly the same layout and information distribution.
  4. I labelled the documents using Data Manager after creating a unique “regular field” for the final total amount of the invoice. Then i exported the results.
  5. In AI Fabric, i created a project and a ml package inside of it (using out-of-the-box templates).
  6. I uploaded the folder, generated by the Data Manager after exporting, as a new DataSet.
  7. I managed to run a pipeline for training the model created using the dataset
  8. Finally, i deployed the ML Package with a ML Skill and selected it in Machine Learning Extractor in “Document Understanding Framework” flow in UiPath Studio.

After configuring the fields to be extracted by ML activity (only the field “total”), i’ve tested the flow using 1 invoice with exactly the same layout and even with 1 of the invoices used for training it.

In the validation station step, the field total is filled with the door number of the address of the issuing entity.

I’ve repeated the whole process and included additional invoices with some rotations and artificial content-distribution changes using Photoshop.

The results are exactly the same. My question is whether i’m doing something wrong or not training the model with enough documents in order to work properly. I remark the fact of running a fairly simple test, 5 - 10 invoices which look exactly the same, and extracting 1 field. I’ll attach one sample invoice of the used for training.

I hope you can help me with this issue, thanks in advance !!

Bye

Antel-Febrero.pdf (254.3 KB)

Hey Brother ! I have an issue when installing the data manager and trying to connect to the azure container registry, how did you solve it ? I’m seeing that you were able yo use the data manager.

I’m Stuck in this part:

docker login aiflprodweacr.azurecr.io -u -p

I’m inserting my docker credentials and azure credentials ans is not working, I’m getting the following exception:

unauthorized: Application not registered with AAD.PS

Did this happened to you ? which credentials should I use ?

Hello Ernesto, hope you’re doing well !

I haven’t faced that issue. However, as you mention, your issue is related to the credentials you’re using when typing the docker command.

Those are called ‘registry credentials’ in documentation and nor are linked to docker or your azure account. I requested them to the person that my business knows in UiPath.

After getting the credentials, you might authenticate to azure cloud using docker commands. I hope this works for you !

I would also like to know about the results you get from Document Understanding whenever you do it.

Cheers!

For sure man ! muchas gracias !

I’ll let you know the outcome of the training and validation of the model once is done.

Thank you so much for the insight

Ernesto

Not an ML expert here, but from what I understand about it 5 invoices for training is simply not enough. Remember that the machine is not able to comprehend the concept of an invoice, you’re going to need to train it with a lot more examples until it starts predicting correctly.

Rotating the invoices is not going to work well, the invoice can be de-rotated and then you’re just feeding it with the same information.

If you train it with artificial data, it will only be able to work with artificial documents and will not do a proper job with a real one.

How can I extract tabular information from multiple pages of a single pdf.All the pages have similar structure of table but different data.