Hello,
I am trying to Data scrape Home Depot page, although I am having issues retrieving the information.
- Data scrape is not retrieving all the information of the products
- Data scraping will stop when we have an advertisement from the webpage
Hello,
I am trying to Data scrape Home Depot page, although I am having issues retrieving the information.
The HTML of the Ad is different from the data you are scraping. Therefore, the scraper stops when an ad disrupts the pattern. (sometimes I feel that this is a deliberate move to discourage robots from scraping whilst making ad revenue).
One way to get around this is to scrape only the link from each of the grids and then have the Robot navigate to each link and get rest of the information. As a link is a common element between the AD and your data it may be less disruptive.
The other way is to scrape area above the ad region followed by a scrape below the ad region and then merge the two. I am leaning towards this solution because if the AD is showing up in a fixed place on the page, then you have areas that are templated and won’t confuse the Robot.