Document Understanding - Invalid date extraction

Aman_Jee_US · November 28, 2022, 3:05pm

Date extracted in a different format
Expected Date - Effective Date: Apr 16, 2022
Output - Effective Date: A{{p$rE1ff6D, 2at0e22

Please help me with the solution

Nishant_Banka1 · November 28, 2022, 6:20pm

Hi @Aman_Jee_US ,
This happens because of improper digitization.

Which OCR are you using?

Thanks,
Nishant

sharon.palawandram · November 28, 2022, 9:47pm

Hello Aman,

need more information on this to give you a targeted answer.

What’s the document understanding extraction scope you’re using? (eg. Forms Extractor, ML extractor)
What’s the OCR you are using?

This looks like either an extraction or digitization error.

Aman_Jee_US · November 29, 2022, 4:23am

Thanks for the response. I tried UiPath OCR, Tesseract OCR and Omni Page as well

Aman_Jee_US · November 29, 2022, 4:26am

Thanks @sharon.palawandram ,

I am using Machine Learning Extractor, But I also tried Intelligent Form Extractor and Form extractor and the value are coming same for all.
I have tried UiPath, Tesseract & Omni Page OCR

sharon.palawandram · November 29, 2022, 11:34pm

What’s the OCR you use in the Machine learning extractor? Is it uipath OCR?

and is it the same result in all the OCRs?

You may want to check your training data. When you label the date in document manager how does the date come? does it come as Apr 16, 2022 or A{{p$rE1ff6D, 2at0e22?

Aman_Jee_US · November 30, 2022, 6:33am

Yes, I am using UiPath OCR and all other OCRs are giving the same result. In the document manager it comes as Apr 16, 2022. I also tried Intelligent & Form Extractor and in Template manager it comes as A{{p$rE1ff6D, 2at0e22. Also in text only view the data has same A{{p$rE1ff6D, 2at0e22

sharon.palawandram · November 30, 2022, 6:53am

okay. Can you use UiPath Document OCR for this use case and check if the settings in your digitize activity are set to the following?

In your digitize activity, under ApplyOcrOnPDF, set it to “Yes” as shown in the below picture.

and then write your digitized output into a text using the following :

then check DOM_worked.json file and check if your DOM has the correct date or the wrong date.

There’s no way that all OCRs give the same error. There should be something missing apart from changing OCR and Extraction.

In extraction, can you check if you have properly set your ML skill? Have you set it to auto update?

You can also kick off an evaluation pipeline and check if you’re getting the date accurately.

Topic		Replies	Views
How To Extract The valid date from document using Document understanding when I extracted its getting incorrect value Activities activities , question , document_understanding	7	473	November 7, 2023
DU_ for Handwritten invoice correct Value is not extracted in presentation station OCR used uipath Document OCR Activities activities , question , document_understanding	2	198	January 11, 2024
Invoice Date from ML and Form extractor Studio	3	928	November 24, 2020
ML Skill extracts same date value in different formats AI Center question , document_understanding , ai_center , machine-learning-extractor	0	485	April 20, 2023
Date not extracted corretly in data labeling AI Center question , ai_center	8	684	June 21, 2022

Document Understanding - Invalid date extraction

Related topics