Extracting Table from PDF

Hi everyone, I have a pdf file (Bank Statement) and I want to fetch transaction table from it to excel using Tesseract OCR. I have tried with data scraping and screen cannot able to be captured. since tesseract ocr is passing completely as string, is is possible to extract to data table? Below is the output from tesseract OCR,

Your Transaction Details
Date Details Withdrawals Deposits Balance
Apr 8 Opening Balance 5,234.09
Apr 8 Insurance 272.45 5,506.54
Apr 10 ATM 200.00 5,306.54
Apr 12 Internet Transfer 250.00 5,556.54
Apr 12 Payroll 2100.00 7,656.54
Apr 13 Bill payment 135.07 7,521.47
Apr 14 Direct debit 200.00 7,321.47
Apr 14 Deposit 250.00 7.567.87
Apr 15 Bill payment 525.72 7,042.15
Apr 17 Bill payment 327.63 6,714.52
Apr 17 Bill payment 729.96 5,984.56
Apr 18 Bill payment 223.69 5,710.87
Closing Balance $5,710.87

Hello @deepan.b,

Welcome to the community,

There is a activity to extract tables from pdf to excel. Have a look at it.

Hi @deepan.b welcome to forum

For extraction of tables from PDF

You can use document understanding feature of uipath

Check this video for understanding of extraction of tables from PDF using document understanding feature by @Parth_Doshi

Hope it helps you


Nived N :robot:

Happy Automation :relaxed::relaxed::relaxed::relaxed:

Hello Deepan,
In this video, I extract tables from PDF and write data in Excel:

0:25 Install PDF Activities
1:10 READ PDF text, Get PDF page count, Extract PDF
5:40 Read PDF with OCR
6:55 Join PDF and Manage PDF passwords
9:30 Extract Images From PDF and Export PDF as Image
12:00 Extract table from PDF use-cases 1 replace some spaces with | (one column has multiple words)
24:00 Run the robot to see the result
25:40 Extract Table from other PDF use-cases 2 delimiter is 2*spaces " " easy split
31:50 Extract Table from complex PDF use-cases 3 unstructured data the logic will be based on IsUpper and IsLower
40:25 Extract the price value from PDF

Cristian Negulescu

@deepan.b - Please take a look at this post…Exactly the same data has been posted and the solution has been offered…