How would you screen scrape a "segmented" table like attached URL?

I was trying to scrape this URL to get all PIzzas and their ingredients.
But it is segmented in different chunks.
Any advice is welcome and this is just for fun.

scrapping segmented tables are dependend to the webapages, but can often be achieved by combining:

  • datascrapping
  • dynamic index / selectors
  • find children.

So first thing is to grap as much as possible (for the left, the right is quite similar)

then we check for more possible iterators eg.

all in all it does need some more detail analysis to find out the best retrieval strategy.

Let us know in case of you need more help on this

How about HtmlAgilityPack.

  1. download HtmlAgilityPack <-PackageManager
  2. add namespace
  3. new variable HtmlDocument
    Two Way 1. With Html Text
    2. HtmlWeb ← refer aiglity pack site
  4. Select Node by XPath
    xpath of pizzor div : //div[@class=‘et_pb_row et_pb_row_3’]//div[@class=‘dsm_pricelist_item_wrapper’]
  5. extract Inner Text ← ForEach Activity
    node ← foreach item : HtmlNode Class

assign String Array and …
node.InnerText.Trim.Split(Environment.NewLine.ToCharArray,StringSplitOptions.None).Select(Function(s) s.trim).ToArray ← Assign Activity

Try It!