Hey guys, so I’ve ran into a problem yet again and want to figure it out once and for all.
So I’m doing data scrapping on a specific website. Im basically going through like 10k urls to scrape data and it takes up a normal bit of time. I thought about next page button when scrapping. In that instance, I wouldn’t have to load a new webpage every time. But the problem is, that from time to time my internet connection cuts off, and I can’t figure out how to manage this problem.
Imagine, you go to url, for example Iphone google search and you want to scrappe all the given url’s with this keyword. You do “Extract table data” with next page button inside of it, but lets say when you get to page 8, your internet cuts off. (How can I make it restart the scraping from the same page that the internet cut off?)
I hope I managed to explain it clearly and i’m waiting for your ideas, thanks in advance!
You can add a row into config file (or another excel temp) and read the page number and update the row that you’ve created in config file while it iterates through the pages.
Create a variable called pageNumber and read it from config file at the beginning of the process. You can directly go to the page that pageNumber variable referring before it starts to data scrape.
I would love to see maybe an example of it? I’m trying to figure out what you are saying, but it is hard to even imagine this task. Really grateful for your response and thanks!