Dear Forum Members,
I am working on a solution where I need to extract details from different documents. For that, I am using Out of Box model named, “Passport” and “Document Understanding”. And the model is trained on enough documents. Now, for few cases, data is extracted completely and in few cases, data is partially extracted. And in other document, model is extracting incomplete number.
Below are the details for the cases of Partial Data Extraction and Incorrect Data Extraction:
- Checked the output of OCR under Digitize Document. For UiPath Document OCR, data is there but checked for other OCRs, i.e., Omnipage, Tesseract, Microsoft Computer Vision, OCR is not providing any data. And input of ML extractor has data but output doesn’t contains the data.
- For Incorrect Data Extraction, I am trying to extract data from Aadhar card, And from the back page, it is extracting data correctly but from front page, it is extracting incomplete aadhar number. For this, the output of UiPath Document OCR doesn’t contains the complete number and other OCRs apart from Microsoft, it contains the number completely but doesn’t contains other details. Microsoft OCR output is blank.
I hope, I could explain the problem statement. Can you please suggest some solution because I have retrained the model multiple times. Is there anything that I can do for full and correct Data Extraction.
Thanks,
Dimple