I have a requirement where I have 10 formats of PDFs and max fields count is 25 considering all 10 formats pdf’s out of these I need to extract 10 fields from one pdf format and from another pdf format I need to extract 25 fields. So here based on the pdf format fields extraction count changes.
You can create all 25 fields in AIC center and train your model. During test whatever file you upload in that whatever 10 fields are present that will be extracted and remaining are going to be blank.
I have a small question let us say I have created 25 fields when I start labelling the data I need to label all the 25 fields even if I need only 10 fields from a pdf if I don’t label it will throw an error when exporting to AI center i have faced this problem.
Yes, it is compulsory in AIC center for every field you need at least 10 records then only you can can export the label otherwise it will show you error. Even you can not duplicate the pdf so you need to work accordingly. So, as per my opinion, if you need only 10 fields every time so you take that only.
Let me put it in this way let us say I have two formats,
In one format I have CGST and SGST fields and
In another format I have IGST field.
Now I will create three columns fields like CGST ,SGST and IGST.
Now when I am labelling a document containing CGST and SGST fields I will label only CGST and SGST but I can not label IGST field because I don’t have that field in that document.
Now when I am labelling a document containing IGST field I will label only IGST field in this case I can’t label CGST and SGST fields.
After labelling is done when I export the labelled documents to AI center it is throwing error in this case how to resolve the issue.
I believe it is necessary to include more training samples for those fields where the labelling done for fields are recognised as low, it is already mentioned in the docs for Training High Performing Models :