Faster alternative to Browser.NavigateTo

rabbit_rebozo · April 25, 2018, 7:48pm

So we’re scraping an HTML table of links (several different URLs, with anywhere from a dozen to 50 or so links). Unfortunately the table we scrape doesn’t have some data that we need, such as the descriptive title of the page being linked to, and the date when it was last updated. So, we’re resorting to iterating the scraped links and using Browser.NavigateTo to open each page, where we do some text scraping.

This is painfully slow. Trying to think of something a bit peppier, I’ve played with the idea of downloading the HTML… without actually navigating to and rendering the page… and then doing string searches etc. to get what we need.

I’ve tried using System.Net.WebClient.DownloadString, which does work in general, but not with the specific URLs we’re dealing with here (I suspect it has something to do with security certificates, or redirects, or something similar).

Have also looked at the HTTP Request activity but haven’t been able to get that to work. Though I’ve seen some samples of how to use it to download FILES and such, I haven’t found a clear example of how to use it to download or steam the HTML.

Open to any suggestions or insight anyone can offer.

ddk

tmays · April 26, 2018, 1:16am

Hi @rabbit_rebozo

You can always try to run Headless Chrome from the command line. It’ll save you the overhead of rendering the pages and may help with the issues you’re having with System.Net.WebClient.DownloadString.

Just something to try.

Regards
Troy

Topic		Replies	Views
Refreshing browser question Help	6	3181	July 20, 2019
How to speed up data scraping Help studio , data_scraping	4	3284	May 9, 2018
Clicking on multiple hyperlinks on a webpage Help	4	2293	September 30, 2019
Screen scraping without using Next button Help studio , data_scraping	4	3366	February 14, 2018
Need Help With Site Opening and Downloading Help	3	1267	April 23, 2019

Faster alternative to Browser.NavigateTo

Related topics