Sadly I see these issues were reported YEARS ago and have not been fixed.
Consider the following things to OCR:
I have tested this on multiple documents of differing scan qualities and it always reverses “Right of” to “of Right” and “by the” to “the by.”
In other words it gets “Multiple-Party Account - Tenancy the by Entireties” instead of “Multiple-Party Account - Tenancy by the Entireties”, “Multiple-Party Account With of Right Survivorship” instead of “Multiple-Party Account With Right of Survivorship.”
I am using UiPath.System.Activities 24.10.7, UiPath.DocumentUnderstanding.LocalServer 1.5.1, UiPath.OCR.Activities 3.21.2, and Studio 24.3
Also, it does a great job of picking up the checkboxes as ☐ or ☒ but if the line to the right of the checkbox is initialed it reverses them. In other words I get…
mIL ☒ Multiple-Party Account - Tenancy the by Entireties
Instead of…
☒ mIL Multiple-Party Account - Tenancy the by Entireties
So if I OCR that first section (with Multiple-Party Account checked and initialed) I get…
☐ Single-Party Account ☐ Multiple-Party Account mIL ☒ Multiple-Party Account - Tenancy the by Entireties ☐ Trust-Separate Agreement Dated: ☐
It puts the initials before the checkbox. Also, with there being two checkboxes on one line in that first section, sometimes it puts the Multiple-Party Account initials before Single-Party Account. This honestly makes it useless for automatically analyzing these sections. I had to go nuts with ClippingRegions and detecting each label then moving the ClippingRegion and just OCRing each individual checkbox/initial line. Very time consuming.
For a section like this it’s spot on:
I can OCR the entire section then just use the resulting text to determine what is and isn’t checked. Super simple and useful.
One last comment about the UiPath Document OCR activity…it doesn’t handle blue ink well. For example:
If I don’t have the bounding box around the initials really big to go around all the ink, it won’t recognize that there is anything there. In other words if I OCR this…
It doesn’t see the initials. If I OCR this it does see the initials:
And yeah I know…AI, ML, etc…we don’t have that yet. Hopefully soon.