Document understanding - Unable to extract value for particular text field alone

Hi Team,

I’m trying to extract datas form multiple invoice PDFs. I’ve added a ML skill by data labelling the fields I needed. All field works fine except for one field “DC (location code)” I tried extraction the data from exact same document in the studio which i used for Data labelling but still I’m not able to get that particular data alone. It is a Regular field (Not classification/column field) and also it is not multi line value. Can someone please help me to get this resolved?

When I tried to use Intelligent form extractor using the same invoice template it works fine. But not working in the ML extractor.

Thanks,
Aswin

Hello @AswinSridhar ,

Could you confirm whether you have labelled the element “DC (location code)” properly?
Also is the name static in all the documents?

Hi @Rahul_Unnikrishnan, thanks for responding!

Yes, I’ve labelled the element properly. The name is not static in all documents, it will be in different locations in the first page of every invoice. I’ve used 15 different templates which has the DC number in it to train. But, when I tried with a sample document from studio using the ML skill, it didn’t work.

  1. It could be due to specified confidence value in ML extractor try decreasing the confidence % value.
  2. If this won’t work you could run different extractor in parallel with first ML extractor and in manage settings, extract value of (location) from second extractor(it could be another ML extractor with different endpoint or form extractor).
  3. If the pdf is of a fixed format, you could explore regex based extractors.

Note: Regex based/ Form extractors are mostly used for fixed format pdf’s, while ML provide you flexibility with semi-structured PDF’s. Try combinations of multiple extractors.

hope this helps.

Best Regards,
Sagar Rana

Hi @Sagar_Rana , thanks for the reply!

I’ve set the Minimum confidence to 0% and still it didn’t fetch that particular value alone. Also, It is not a fixed template so I was not able to proceed with the Intelligent form extractor/ Regex approach. I’ve tried to data label some more additional documents which had the particular field in it but still facing the same issue.