Read PDF

pdf
activities

#1

Hello Everyone,

I am doing Read PDF with OCR activity as my pdf is in Image so I want to convert it but not getting a proper output :(. If anyone please help me out !

Regards,
Hemal


#2

Hi
OCR is never reliable and therefore we can’t expect 100% correct response. Licensed OCR engines which primarily run on Cloud like Google Cloud OCR or Abby Cloud are better in terms of efficiency.

However do note that they are paid.


#3

yes, true. So what do we do? What if there is handwritten text?


#4

Hello Hemal,

From my perspective OCR is just a last resort option when in comes to automation. In my own tests only 3% of the PDF where scanned successfully. If you have more than one PDF do your math (haha). Therefor I would suggest (depending on how much PDF’s) to copy paste the content into a word document and read it out of that one.

Have a great day!


#5

Handwritten text cannot be processed by OCR as they work well only for electronic texts. You can try looking for any suitable ICR engine (Intelligent Character Recognition).
However I haven’t worked on it yet.


#6

For ICR try parascript. I have not integrated with RPA before but was successful in BPM applications.
It works best when there is something like a table on confined set of parameters to check address, such as Address Database, list of possible values. Names , email etc are very difficult as there are infinity possibilities.