Scrape Google Groups (topics/posts)

First time user and I’m impressed with the feature set of the product.

My initial goal:

Scrape from specific Google Groups all the topics and related posts into CSV files.
I’m using Firefox and installed the plugin.

I started creating a project but now get stuck. I hope I can get some tips here.

Have a look at this group:

https://groups.google.com/forum/#!forum/forum.centura.web.developer

Starting from that page is difficult as it is loading extra topics when scrolling down.
But there is another way.

When clicking on the first topic you get on a new page showing all posts within that topic.
The page has a NEXT button which will jump to the next topic.

Couple of issues:

  • Initially the first posts are not expanded, only the titles of the posts are shown. Only the last post is expanded. So the first action would be to expand them all. This can be done using the o key.
  • Some posts have attachments and images
  • There are 1…n posts per topic
  • The topic title is on top of the page as text.
  • The date is shown, but it has an internal date/time stamp.

I was able to create a project which gets

  • author
  • date (not the date/time)
  • body text (without images or attachments)
  • pressing the next button and repeat extraction

The o key press I could not implement. Also the topic title could not be fetched as it is only one item, not repeated.

I was searching for any projects from the community for google groups scraping but could not find it.

In the end I would like to have all topics/posts/images and attachments saved into manageable format (CSV, json whatever).

Can someone give me some starting point how to implement this?

Regards,
Dave