Document Understanding Extraction for Handwritten Forms is not Accurate

The document understanding feature that is released in new update has a problem while extracting data from handwritten forms, the data is not extracted properly, as a change was specificed in previous version and asked to use Form Extractor which bought us a pretty much Accuracy, I was expecting greater accuracy using the ML extractor ( which seems to be less than the earlier Intelligent Form Extractor approach)
Here I have attached the image for specification
#uipathcommunity #documentunderstanding #uipath #uipathacademy

Daniel Lerner Steve Tegeler Radu Pruna


Hi @Kunal_Jain

These documents seem to have overlapping text which makes them harder to recognize. It would help if you can share 10 of these samples so we can try to reproduce and troubleshoot the issue. Feel free to reach out to me over direct message.

Thank you,
Alex.

Yes I am Sharing the forms
PFA

ASR - RENTAL Registration - 141-55-167A - 2022 - 9-21-2021.pdf (353 KB)

ASR - RENTAL Registration - 141-74-036 - 2022 - 9-17-2021.pdf (268 KB)

ASR - RENTAL Registration - 141-89-028 - 2022 - 9-22-2021.pdf (206 KB)

ASR - RENTAL Registration - 142-33-182 - 2022 - 9-17-2021.pdf (342 KB)

ASR - RENTAL Registration - 142-33-234 - 2022 - 9-23-2021.pdf (309 KB)

ASR - RENTAL Registration - 142-71-409 - 2022 - 9-23-2021.pdf (305 KB)

ASR - RENTAL Registration - 142-73-714 - 2022 - 9-23-2021.pdf (310 KB)

ASR - RENTAL Registration - 142-75-386 - 2022 - 9-23-2021.pdf (306 KB)

ASR - RENTAL Registration - 200-30-151 - 2022 - 9-17-2021.pdf (266 KB)

ASR - RENTAL Registration - 200-90-552 - 2022 - 9-17-2021.pdf (410 KB)

ASR - RENTAL Registration - 231-10-739A - 2022 - 9-17-2021.pdf (401 KB)

Thanks @Kunal_Jain these are pretty challenging, for 2 reasons:

First the handwritten text overlaps a bit with surrounding printed text, which causes the OCR to get confused and miss some words completely.

Second, even when the text is detected correctly, due to the overlap with surrounding text, the Forms AI also gets confused. Forms AI is aimed at relatively straightforward clean forms, without a lot of variation, and because of this text overlap, it introduces too much entropy into the document text lines than what Forms AI can handle.

In these kinds of situations, ML is the way to go. I recommend training an ML model by labelling 50-100 of these pages in a Document Manager session, and then training a model using the DocumentUnderstanding ML package in AI Center.

Is that a reasonable route you might try?

Alex.

1 Like