I’ am using Data Scraping Wizard to scrap data from a website,
So there’s large amount of data I want to extract, and data scraping takes much time to do so.
In this process after some time, website detect that there’s presence of robot, and it’ll navigate to CAPTCHA or any 404 Not Found page.
The problem is I can’t handle these type of exceptions while data scraping is in running state,
It results in getting inadequate data.
So is there any technique to hide your robot’s presence Or to prevent getting blacklisted while data scraping ?
I’ve tried by increasing value of DelayBetweenPages parameter of data scraping.
@loginerror @ovi @badita @Akash_N_Jain
Maybe you can think about the following methods:
it can return the number of rows in dt
hi @jmy thanks for the reply, but here I don’t want count of rows.
My problem is how to handle exceptions while data scraping is in running state, and how to prevent getting blacklisted because of Data scraping.
Try adding ‘Delay’ activity when going to the next page instead of ‘DelayBetweenPages’ .
I think DelayBetweenPages is the only parameter there, to specify time to wait while going to the next page.
Could you please tell me @itsahmedfiroz , how can I use delay activity inside data scraping (Extract Structured Data activity)