Extracting PDF data from a website and storing it in an EXCEL Sheet

siddharth · April 25, 2017, 2:52pm

So, as the title says, I’m trying to extract pdf data (a table) out of some websites, and I’m trying to store it in an excel sheet. Now I’m able to do that using web/data scraping, but i’m facing a couple of issues in case of some of those pdf files.

ONE: If there are multiple pages in the pdf, and the header is in each of the pages, extra columns are getting generated. For instance the headers are Name, Address and Phone Number, and the number of pages are 18, the number of columns getting generated are 18 times 3

TWO: When trying to scrape one of the pdf tables from some website, I could only get partial data out of it. As in, there were five columns in total, but data scraping could only get 3 of them.

I’ll be grateful if anybody can help me out here.

Thank you. …

vvaidya · April 25, 2017, 8:23pm

For #1 As a last resort may be you can scrape page wise (using Range) and manipulate the datatable to remove Column header and use Append Range activity.

or you can remove Column headers using For Each once you have your datatable reading full PDF.

For #2 Why weren’t the column names mapped properly in the below image? for eg > Pin column is mapped to Date info

siddharth · April 26, 2017, 7:41am

That’s one of the issues I’m facing, and it’s the only pdf I’m getting this issue with. As for your solution, I’ll try to see if I’m able to scrape page wise and get back to you.

Topic		Replies	Views
Extract Table from pdf using Data Scraping Studio datatable , pdf , data_scraping , question	17	7498	January 3, 2022
Extract tabular data from PDF Help pdf , activities , data_scraping , question , data_manipulation	7	1623	December 14, 2019
Data Extraction from PDFs Activities pdf , activities , question	2	4636	January 14, 2022
Using Data Scraping on PDF Help pdf , data_scraping	3	1659	July 18, 2018
Extracting specific PDF data from the specific page Help selector , uiautomation , studio , data_scraping	3	4192	December 11, 2017

Extracting PDF data from a website and storing it in an EXCEL Sheet

Related topics