Document Understanding - Invalid date extraction

Date extracted in a different format
Expected Date - Effective Date: Apr 16, 2022
Output - Effective Date: A{{p$rE1ff6D, 2at0e22

Please help me with the solution

Hi @Aman_Jee_US ,
This happens because of improper digitization.

Which OCR are you using?


Hello Aman,

need more information on this to give you a targeted answer.

  1. What’s the document understanding extraction scope you’re using? (eg. Forms Extractor, ML extractor)
  2. What’s the OCR you are using?

This looks like either an extraction or digitization error.

Thanks for the response. I tried UiPath OCR, Tesseract OCR and Omni Page as well

Thanks @sharon.palawandram ,

  1. I am using Machine Learning Extractor, But I also tried Intelligent Form Extractor and Form extractor and the value are coming same for all.
  2. I have tried UiPath, Tesseract & Omni Page OCR

What’s the OCR you use in the Machine learning extractor? Is it uipath OCR?

and is it the same result in all the OCRs?

You may want to check your training data. When you label the date in document manager how does the date come? does it come as Apr 16, 2022 or A{{p$rE1ff6D, 2at0e22?

Yes, I am using UiPath OCR and all other OCRs are giving the same result. In the document manager it comes as Apr 16, 2022. I also tried Intelligent & Form Extractor and in Template manager it comes as A{{p$rE1ff6D, 2at0e22. Also in text only view the data has same A{{p$rE1ff6D, 2at0e22

okay. Can you use UiPath Document OCR for this use case and check if the settings in your digitize activity are set to the following?

In your digitize activity, under ApplyOcrOnPDF, set it to “Yes” as shown in the below picture.

and then write your digitized output into a text using the following :

then check DOM_worked.json file and check if your DOM has the correct date or the wrong date.

There’s no way that all OCRs give the same error. There should be something missing apart from changing OCR and Extraction.

In extraction, can you check if you have properly set your ML skill? Have you set it to auto update?

You can also kick off an evaluation pipeline and check if you’re getting the date accurately.