Scanned Table Data from PDF to Excel File


#1

Hi,
I have an PDF files it contains Scanned table data…
I need to write it in Excel File
When i am using Google OCR it returns Symbols Only…
I am also try with Screen Scrapping… it returns data in unstructured Format
Anyone give suggestion for that…
Regards,


#2

Try using different OCR engines and with different scales to check which works best for your document. Also, you can provide allowed characters or extract words as per your requirement.


#3

Google/ Modi Cannot do OCR on large pages where there are multiple Font Types and Font Sizes.
They both do ok with Text of Same Size/FONT/ & without any image or any other digital noise .
The other way is that you use anchor Images and OCR a targetted area. But in case of invoices the structure will keep changing and then you cant train the robot for every template

You will need the likes of Abby in case you want to translate the whole page. There are other tools we use but i cant disclose the name.