Document Understanding - Digitization

farheenfatma61 · September 21, 2022, 11:15am

Hi,

I am facing an issue while digitizing the document. I have done different hit and trial for ocr engines and the ‘UiPath Document OCR’ works well but the issue here is like the output that it is providing for date is not correct.

the output of the date it is giving is 2623 or 9.04.2022

Can you please suggest any solution.

Attaching a screenshot of the file.

sharon.palawandram · September 21, 2022, 4:41pm

Hello Farheen how did you find out that this was a digitization issue? Can you elaborate on the process steps you’ve taken?

Because extractions not coming well could also be an extractor issue. let me know.

Nishant_Banka1 · September 21, 2022, 5:04pm

Hi @farheenfatma61,
This is an issue (not very common), which occurs when we are trying to digitize a non-native PDF/image and the target is not clear (or has a background noise), which I can see in your case.
My suggestions would be to

Check what is the output of the digitization (both Document Text and DOM) and verify if the date is being digitized as “29” or not. Do this for other OCRs as well. If we see that it is indeed being digitized as “29”, then we can probably use some other method such as Regular Expressions for extracting the correct date
You can add a validation station or an Action Center to send items for manual verification/rectification by checking the Extraction confidence. You could set a threshold value (say 0.8) and whenever the confidence is lower than the Threshold value, the item would be raised for manual intervention

Let me know if this helps.
Thanks,
Nishant

farheenfatma61 · September 23, 2022, 5:40am

Thank you for your help.

system · September 26, 2022, 5:40am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Date read (dd/mm/yy) from PDF but output (dd1mm1yy) Activities pdf , question	7	758	May 20, 2022
Document Understanding - Invalid date extraction AI Center question , document_understanding , ai_center	7	779	November 30, 2022
UiPath Document Understanding - Data Extraction Activities uiautomation , activities , studio	4	906	June 27, 2022
Document Understanding – Digitize Document – Native PDF inaccuracies Document Understanding	6	2011	April 18, 2022
Google OCR is recognised wrong date, number 6 is recognising as 8 Help ocr	6	1385	June 13, 2018

Document Understanding - Digitization

Related topics