I am using the standard out of the box package for invoice and purchase order model to extract information from PDF. However not all the information (fields) are extracted from the PDF. The PDF’s correspond to Invoices and PO’s and are of various formats.
I have trained the models using 15 pdf documents of each type and the evaluation score increased. Even though some of the missing information is now being extracted, the model is still unable to extract some fields (which it previously extracted)
How does the training work? What is the optimal number of documents on which the model has to be trained? Will it ever achieve 100% accuracy or extract all the fields?