I’m trying to extract datas form multiple invoice PDFs. I’ve added a ML skill by data labelling the fields I needed. All field works fine except for one field “DC (location code)” I tried extraction the data from exact same document in the studio which i used for Data labelling but still I’m not able to get that particular data alone. It is a Regular field (Not classification/column field) and also it is not multi line value. Can someone please help me to get this resolved?
When I tried to use Intelligent form extractor using the same invoice template it works fine. But not working in the ML extractor.
Hello @AswinSridhar ,
Could you confirm whether you have labelled the element “DC (location code)” properly?
Also is the name static in all the documents?
Hi @Rahul_Unnikrishnan, thanks for responding!
Yes, I’ve labelled the element properly. The name is not static in all documents, it will be in different locations in the first page of every invoice. I’ve used 15 different templates which has the DC number in it to train. But, when I tried with a sample document from studio using the ML skill, it didn’t work.
- It could be due to specified confidence value in ML extractor try decreasing the confidence % value.
- If this won’t work you could run different extractor in parallel with first ML extractor and in manage settings, extract value of (location) from second extractor(it could be another ML extractor with different endpoint or form extractor).
- If the pdf is of a fixed format, you could explore regex based extractors.
Note: Regex based/ Form extractors are mostly used for fixed format pdf’s, while ML provide you flexibility with semi-structured PDF’s. Try combinations of multiple extractors.
hope this helps.
Hi @Sagar_Rana , thanks for the reply!
I’ve set the Minimum confidence to 0% and still it didn’t fetch that particular value alone. Also, It is not a fixed template so I was not able to proceed with the Intelligent form extractor/ Regex approach. I’ve tried to data label some more additional documents which had the particular field in it but still facing the same issue.
Sorry for the late reply.
If it’s convinient for you, could you share some sample files or screenshots to let us better know your issue.
Please add some more sample documents to the Data Labelling and Re map the same field in all the documents and do a pipeline run, can try extracting the data.
Retraining of the model improves the performance of the model.
Hopefully that works,
So far you have trained 15 samples. Can you confirm if all 15 invoices are dynamic?
You can run an evaluation pipeline and see if the ML model is extraction DC(location code). If it doesn’t it means you need to retrain the model with more data.