Extract scanned PDF to excel

Hi Folks,

I need to be extract the data from scanned PDF, columns like Policy, Eff,Insured, TYpe, Invoice, Gross Prem, Comm% and Invc Amt Paid and move the data into excel
Note : In one pdf it may contains 2 pages in another pdf it may contains 8 pages in such a case i need to extract data from all the pages and need to convert to excel.

KIndly help with the solution.
Cochrane (1).pdf (54.5 KB)

@chandra_raju

You can use Read PDF activity if it is plain text pdf OR Use Read PDF with OCR for Image PDF
and write to text file

From the text file, you have to do regex, depends upon the requirement

Hope this will helps

Thanks

Hello,

Two methods:
@Srini84 already told you one,
secondly, use Digitize Document activity in Intelligent OCR and Use OmniPage OCR, you will get better results

Thanks @Srini84 & @sachinbhardwaj for the support.
I need to remove the below data from either text file or while extracting the data from PDF

If it may have multiple pages as well i need to be extract only structured data from pdf and i need to be write into excel is there any approach for this solution. Below is the refernce data.data.txt (1.5 KB)

Hi @chandra_raju

Below is the workflow for the same :-
MainPratik.xaml (16.7 KB)
Cochrane (1).pdf (54.5 KB)
abc.txt (2.6 KB)
data.txt (1.5 KB)
output.xlsx (9.1 KB)

Output :-

image

Regex used :-

Mark as solution and like it if this helps you :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.