Hi all, @Ioana_Gligan@Jeremy_Tederry
I’m facing a problem when I’m extracting text from a document where there is some overlapping text. e.g. a stamp near the text or on the text. In these scenario, what practice can be followed to extract the background text ?
And moreover also, if we want to extract the handwritten text ? what technique can be used to extract the handwritten text from an input document ?
Hi @raheelferoze !
If the document is not a native document (so the document is a scan or an image) then to read everything including the stamp you should use OCR.
To do so, you have several options:
if you already have bought an OCR (like ABBYY), then you should plug it to UiPath
if you did not buy an OCR, you can try to use pdf activity from UiPath which is free, but I don’t know about the output result:
to find this activity you need to go to Manage packages, then look for pdf activities at the “All packages” level:
Hi @Hiba_B , Well my scenario is a little different.
I’ve created a workflow by following the Document Understanding Framework in that I’ve created a custom ML extractor on AI center (using Data manager and Out of the box ML packages) which extracts information from different fields from an semi-structured document.
In my workflow, when it is started it gets input documents that are scanned pdf’s. and In those PDFs there’re some stamps causing the overlapping. And in that case the Bot retrieves null or wrong information.
Unfortunately there is no clear way of “separating” text that is overlapping…
The only suggestion I can give you is to try a couple of different OCR engines - maybe one of them reports the characters in a better order / shape that can be interpreted closer to reality…
Are the stamps variable? or the same content? any chance you could write some rules to eliminate the extra characters? In cases of no value extracted, it would be a bit tricky…
Hello @Ioana_Gligan , well, the stamps aren’t really variable, they’re mostly the same but they overlap in some case and in some cases they don’t. I’ve tried different OCRs but each OCR returns null results even if overlapping occurs or even if it is near to none. Just a little touch of the stamp and the OCR returns null value.
@Ioana_Gligan Plus, is there any way to read the handwritten text ? I mean does Google vision or Microsoft Azure OCR will read and understand the handwritten text from input documents ?