Extract tabular data from Read-Only PDF

Vinutha · April 26, 2017, 4:46pm

I could have tried that if column/cell separators were tabs in the extracted string(I wish it was) but every word is separated by space irrespective of which column it is extracted from.
If you take the first row of table in the string,

8/7/2012 007 168 RRR DDDD LLL 3633 LOOP LAKE RREEE DDFDF GA 30506 6855 GGGG GL ENN $2,000.00 $753.75 $1,246.25

In PDF,
RRR DDDD LLL - cell 3
3633 LOOP LAKE RREEE DDFDF GA - cell 4
etc…

In my scraped string, there is no way to identify what has to come in which cell of the row. Every word in the row is extracted one after the other separated by space.

I raised this issue in the webinar and they said they don’t have an easy/straightforward solution for this right now. The only possible way is, Use screen scraping method and extract each column separately by scraping only that region which gives a string output of all cells in that column which we can convert into an array and repeat the same for other columns and later combine and make a datatable. But in my case, the data in the PDF may change at a later stage(pdf is extracted from a place where it may get updated later and accordingly i have to update in my extracted file), rows may get added/deleted, in which case even this solution fails.

If anybody has a static PDF which is a scanned image and table format data(single page) has to be extracted, they can use this method. This method extracts the data perfectly, i have tried doing this.

Topic		Replies	Views
Tabular data extraction from pdf to excel Studio excel , pdf	16	2352	March 5, 2021
Extraction of table data from pdf Something Else feedback	8	513	July 17, 2023
Extract tabular data from PDF Help pdf , activities , data_scraping , question , data_manipulation	7	1513	December 14, 2019
Extract table structure from PDF Help datatable , excel , pdf	4	3538	October 20, 2019
Read pdf file tabular data Studio	6	169	December 4, 2023

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Extract tabular data from Read-Only PDF

Related Topics