Scraping Child Pages from Sales Sites

Hi All,

First post and if this has been answered to some great detail I’m so sorry.
I’m trying to find a way to mine the information of Child pages, on let’s say, Amazon.

As opposed to just grabbing all the summary info, I want to open each link in the search, and mine the details page for the information. This seems very doable but I’m stumbling.

My current Plan of Attack was to set up a flowchart that opens the browser, takes your inputs for the search, then scrubs the results for the URL of each result and dumps that into a dataTable? CSV? (I’d like to keep it in the tool and not generate mess). Then read in the first URL, type into the URL Path, scrape the details, write to CSV, do while there are still URLS. At least that’s my thinking…

Any help would be greatly appreciated; what a tool this is!!!

1 Like

Hi @JordyMicheal

This is completely doable :slight_smile: I would personally store the links after retrieving them due to the ability to track the pages you’ve processed (you can achieve that by using a second column for Status).

I think you should start with the Academy to know the basics, but you can also go with the hands-on approach and search this forum for the following things:

  1. Excel (saving, reading, how headers work)
  2. For Each Row loop (this one is meant to loop through DataTable elements)
  3. How to access specific element of the DataTable (both outside and inside the For Each Row loop)
  4. How to get index of Row in the DataTable

I think this should get you going.

More professional approach would be to use a ReFramework and its capability to first initialize your data (this is where you would scrap the links) and then treat every link as a transaction to be processed with automatic exception processing (in this way if one link fails, it will retry it).

It’s all up to you though, have fun in the process! :slight_smile:

Thanks so much for the kick off @loginerror

I’ll keep you posted on the journey!

1 Like