For a given search string, how can I get URL of first google news?

  1. In Studio, click Data Scraping
  2. Click Next
  3. Click the first article’s Title
  4. Click Next
  5. Click the last article’s Title

In this box, check both checkboxes and name the columns. The first one is the Title, the second one is the URL.


  1. Click Next
  2. Click Extract Correlated Data
  3. Click the first article’s Summary text
  4. Click Next
  5. Click the last article’s Summary text

Name this column Summary or whatever you want, don’t check the second (URL) box since you already have that from the Title. Repeat steps 7-10 for the time posted and source if you also want those. When you’re done, it’ll ask you to indicate the Next button/link, so do that. And you’ll now be able to run it and see it go through the pages and give you back a datatable.

Sharing my workflow won’t help you. You need to go through the Data Scraping wizard yourself. It’s easy.

Here are the results.

Chck it out and have fun =D (5.1 KB)

If you need extra help, let me know

Oh, I forgot to comment…

Check if Google provides an API for Google News. It would be much better to get the data via API rather than extract from the interface.

shouldLoad, previousLastNewsLoaded and currentLastNewsLoaded are variables to scroll down until all news has been displayed. Since in this case there is no paging.

previousLasNewsLoaded starts with “” value because the first time we still don’t know the url of the current last item and we want to scroll down anyway.

Every time we scroll down, we check if next time we should do this step again (this is the assign part with .Equals(). It returns True or False).

How is the verification? If the url of the last item in the previous round is different from the url of the last item after the scroll down, it must continue (shouldLoad = True)

The moment the url of the last item in the previous round is equal to the url of the last item after scrolling down, it means that after scrolling down the last item is the same. That is, there are no more items to load. Therefore, you don’t need to repeat this step again (shouldLoad = False)

About remove duplicate and filter data table activities…

I didn’t investigate deeply, but I noticed that Data Scraping was returning more rows than it should. After including the two activities, the number of lines matched the number of news on the page.

If you have any other question, let me know

If I have free time tonight, I’ll have a look and let you know

