Document understanding - Unable to extract value for particular text field alone

AswinSridhar · May 9, 2022, 5:52am

Hi Team,

I’m trying to extract datas form multiple invoice PDFs. I’ve added a ML skill by data labelling the fields I needed. All field works fine except for one field “DC (location code)” I tried extraction the data from exact same document in the studio which i used for Data labelling but still I’m not able to get that particular data alone. It is a Regular field (Not classification/column field) and also it is not multi line value. Can someone please help me to get this resolved?

When I tried to use Intelligent form extractor using the same invoice template it works fine. But not working in the ML extractor.

Thanks,
Aswin

Rahul_Unnikrishnan · May 9, 2022, 5:57am

Hello @AswinSridhar ,

Could you confirm whether you have labelled the element “DC (location code)” properly?
Also is the name static in all the documents?

AswinSridhar · May 9, 2022, 6:05am

Hi @Rahul_Unnikrishnan, thanks for responding!

Yes, I’ve labelled the element properly. The name is not static in all documents, it will be in different locations in the first page of every invoice. I’ve used 15 different templates which has the DC number in it to train. But, when I tried with a sample document from studio using the ML skill, it didn’t work.

Sagar_Rana · May 9, 2022, 12:04pm

It could be due to specified confidence value in ML extractor try decreasing the confidence % value.
If this won’t work you could run different extractor in parallel with first ML extractor and in manage settings, extract value of (location) from second extractor(it could be another ML extractor with different endpoint or form extractor).
If the pdf is of a fixed format, you could explore regex based extractors.

Note: Regex based/ Form extractors are mostly used for fixed format pdf’s, while ML provide you flexibility with semi-structured PDF’s. Try combinations of multiple extractors.

hope this helps.

Best Regards,
Sagar Rana

AswinSridhar · May 12, 2022, 9:16am

Hi @Sagar_Rana , thanks for the reply!

I’ve set the Minimum confidence to 0% and still it didn’t fetch that particular value alone. Also, It is not a fixed template so I was not able to proceed with the Intelligent form extractor/ Regex approach. I’ve tried to data label some more additional documents which had the particular field in it but still facing the same issue.

Sagar_Rana · June 6, 2022, 5:34am

Hey,
Sorry for the late reply.
If it’s convinient for you, could you share some sample files or screenshots to let us better know your issue.

Thanks

suraj.setty · June 6, 2022, 5:38am

Hi @AswinSridhar

Please add some more sample documents to the Data Labelling and Re map the same field in all the documents and do a pipeline run, can try extracting the data.

Retraining of the model improves the performance of the model.

Hopefully that works,

Thanks

sharon.palawandram · September 19, 2022, 4:12pm

So far you have trained 15 samples. Can you confirm if all 15 invoices are dynamic?

You can run an evaluation pipeline and see if the ML model is extraction DC(location code). If it doesn’t it means you need to retrain the model with more data.

Ritaman_Baral · November 20, 2023, 6:05pm

did you get the solution ? i am also facing the same issue…Please help someone

Ritaman_Baral · November 20, 2023, 6:06pm

In my case the filed (railcar) is getting extracted for all documents except for one…I am unable to find the issue

Topic		Replies	Views
Document Understanding with ML Extractor facing issue Automation Starter studio , pdf-extraction , machine-learning-extractor	5	1058	June 20, 2022
Not extracting proper invoice value from the document after giving training on 20 document using Datamanager AI Center question , ai_center	3	837	August 6, 2021
Data is not extracted properly via Taxonomy Studio	5	652	September 7, 2022
Machine Learning Extraction with multiple PDF formats Document Understanding studio , question	21	2753	March 21, 2023
ML Extractor data labelling for invoices Document Understanding question	2	19	November 20, 2025

Document understanding - Unable to extract value for particular text field alone

Related topics