I am facing an issue while digitizing the document. I have done different hit and trial for ocr engines and the ‘UiPath Document OCR’ works well but the issue here is like the output that it is providing for date is not correct.
the output of the date it is giving is 2623 or 9.04.2022
Hi @farheenfatma61,
This is an issue (not very common), which occurs when we are trying to digitize a non-native PDF/image and the target is not clear (or has a background noise), which I can see in your case.
My suggestions would be to
Check what is the output of the digitization (both Document Text and DOM) and verify if the date is being digitized as “29” or not. Do this for other OCRs as well. If we see that it is indeed being digitized as “29”, then we can probably use some other method such as Regular Expressions for extracting the correct date
You can add a validation station or an Action Center to send items for manual verification/rectification by checking the Extraction confidence. You could set a threshold value (say 0.8) and whenever the confidence is lower than the Threshold value, the item would be raised for manual intervention