Document Understanding - Digitization


I am facing an issue while digitizing the document. I have done different hit and trial for ocr engines and the ‘UiPath Document OCR’ works well but the issue here is like the output that it is providing for date is not correct.

the output of the date it is giving is 2623 or 9.04.2022

Can you please suggest any solution.

Attaching a screenshot of the file.


Hello Farheen how did you find out that this was a digitization issue? Can you elaborate on the process steps you’ve taken?

Because extractions not coming well could also be an extractor issue. let me know.

Hi @farheenfatma61,
This is an issue (not very common), which occurs when we are trying to digitize a non-native PDF/image and the target is not clear (or has a background noise), which I can see in your case.
My suggestions would be to

  1. Check what is the output of the digitization (both Document Text and DOM) and verify if the date is being digitized as “29” or not. Do this for other OCRs as well. If we see that it is indeed being digitized as “29”, then we can probably use some other method such as Regular Expressions for extracting the correct date
  2. You can add a validation station or an Action Center to send items for manual verification/rectification by checking the Extraction confidence. You could set a threshold value (say 0.8) and whenever the confidence is lower than the Threshold value, the item would be raised for manual intervention

Let me know if this helps.

1 Like

Thank you for your help.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.