Hey guys, So I ran into a problem that I need to solve desporately. I have written a code, that scrapes the info of some manufacturers from a site. In the site, there are about 800 pages of info that I need to scrape certain info for a single manufacturer. I have the starter url’s written in excel. When he goes to one of the starting urls, activity extract data table has a next page button inserted to it and goes trough all the 800 pages or more. The main problem is, that my internet cuts off sometimes, but for that I have a retry scope inserted, but when the internet cuts off, the retry scope takes the same first starting url from excel and starts the procces over again, instead of resuming from the same page that he was when the internet cut off. Can some one give me some solutions?
The page that im trying to scrappe: (the code of autopart, the url of the auto part and the price.)
TRODO LOPTOP.xaml (28.5 KB)
My excel with urls:
TRODO.xlsx (9.1 KB)
The main problem is that when internet cuts off, the retry scope activity starts the whole scrapping procces again from the start, I would love it to just when the internet cuts off, he resumes from the same page that he left off. Thanks in advance!
Here what you can do is during execution make a copy of the excel sheet to another sheet and in the process read that new sheet and loop through it. after getting each row from the excel, delete that row .
So if the internet cutsof, in the retry it will continue with the remaining items in the sheets.
Copy from sheet 1 to sheet2.
Loop through sheet2
After extraction , delete the row.
Hi Rahul and thanks for the response
I’m sorry, maybe I didn’t explain the problem as clearly. For example lets take the first url that is in the excel. Those url’s are in a for each row in data table. He takes the first url, that has almost 800 pages inside of it. Then the extract data table does his job, going through all the pages one by one extracting the required data, but then there’s the internet cutting off thing, that disturbs me. Lets say he goes to X manufacturers url from excel, he manages to go through 80 pages inside of the url and then internet cuts off. The retry scope activity doesn’t continue from the 80’th page, he then yet again goes to the very first page of the X manufacturer and does the procces all over again, when I want him to just refresh the page and continue from the page he got lost.
Then can you try with a counter variable. For each page increase the counter variable. Incase if internet gets reconnected, variable will be having the last pagenumber extracted. Then loop till that variable (That is the page number) and continue the extraction.
I hope this approach you can do.
Thanks yet again Rahul, but I can’t procces this for now, i’m somewhat of a begginer. Is it possible for you to give me some examples? Thanks!