Hi everyone, I have a pdf file (Bank Statement) and I want to fetch transaction table from it to excel using Tesseract OCR. I have tried with data scraping and screen cannot able to be captured. since tesseract ocr is passing completely as string, is is possible to extract to data table? Below is the output from tesseract OCR,
Your Transaction Details
Date Details Withdrawals Deposits Balance
Apr 8 Opening Balance 5,234.09
Apr 8 Insurance 272.45 5,506.54
Apr 10 ATM 200.00 5,306.54
Apr 12 Internet Transfer 250.00 5,556.54
Apr 12 Payroll 2100.00 7,656.54
Apr 13 Bill payment 135.07 7,521.47
Apr 14 Direct debit 200.00 7,321.47
Apr 14 Deposit 250.00 7.567.87
Apr 15 Bill payment 525.72 7,042.15
Apr 17 Bill payment 327.63 6,714.52
Apr 17 Bill payment 729.96 5,984.56
Apr 18 Bill payment 223.69 5,710.87
Closing Balance $5,710.87
Hello Deepan,
In this video, I extract tables from PDF and write data in Excel:
0:25 Install PDF Activities
1:10 READ PDF text, Get PDF page count, Extract PDF
5:40 Read PDF with OCR
6:55 Join PDF and Manage PDF passwords
9:30 Extract Images From PDF and Export PDF as Image
12:00 Extract table from PDF use-cases 1 replace some spaces with | (one column has multiple words)
24:00 Run the robot to see the result
25:40 Extract Table from other PDF use-cases 2 delimiter is 2*spaces " " easy split
31:50 Extract Table from complex PDF use-cases 3 unstructured data the logic will be based on IsUpper and IsLower
40:25 Extract the price value from PDF