So, as the title says, I’m trying to extract pdf data (a table) out of some websites, and I’m trying to store it in an excel sheet. Now I’m able to do that using web/data scraping, but i’m facing a couple of issues in case of some of those pdf files.
ONE: If there are multiple pages in the pdf, and the header is in each of the pages, extra columns are getting generated. For instance the headers are Name, Address and Phone Number, and the number of pages are 18, the number of columns getting generated are 18 times 3
TWO: When trying to scrape one of the pdf tables from some website, I could only get partial data out of it. As in, there were five columns in total, but data scraping could only get 3 of them.
I’ll be grateful if anybody can help me out here.
Thank you. …