Hi,
I have an PDF files it contains Scanned table data…
I need to write it in Excel File…
When i am using Google OCR it returns Symbols Only…
I am also try with Screen Scrapping… it returns data in unstructured Format…
Anyone give suggestion for that…
Regards,
Try using different OCR engines and with different scales to check which works best for your document. Also, you can provide allowed characters or extract words as per your requirement.
Google/ Modi Cannot do OCR on large pages where there are multiple Font Types and Font Sizes.
They both do ok with Text of Same Size/FONT/ & without any image or any other digital noise .
The other way is that you use anchor Images and OCR a targetted area. But in case of invoices the structure will keep changing and then you cant train the robot for every template
You will need the likes of Abby in case you want to translate the whole page. There are other tools we use but i cant disclose the name.