I am doing document understanding to extract the data from several invoices having different structure, for that we used mix of form based extractor and machine learning extractor. We made different templates for different structure invoices.
Problem1- We are finding difficultly in extracting the table items properly (Description of goods and Amount). We are getting very less confidence score, even after we used machine learning Extractor. In one pdf, out of 6 fields, it extract only3.
We have used a throw activity to throw exception if the confidence score is less than 0.9 And it will say manual intervention needed. And we will validate in present validation station.
We used TRAIN CLASSIFIER SCOPE activity to get the human validated data (used keyword based classifier trainer) and then export validated extraction results to get automatic Dataset. Then we are able to get the data in excel using append range.
Problem2- Here we used present validation station, which is not practically possible to do while in production, So we will probably remove it once we are confident to extract data. We want to extract data accurately.
Problem3- I want to Extract data only from first page of every invoice. How will that be possible, there is no page1,2 written.
Please help me solve these