Data Scraping on Twitter - need a way to trigger "next page" when there is no button


#1

I want to extract all my past Tweets into Excel (click on Tweets from your home page). A two-minute job, or so I thought. I set up Data Scraping to select Date and Tweet content and set zero in the number field so all Tweets would be returned. Twitter (on the page that presents all your past Tweets) doesn’t have a “next page” button, so I left that blank. This activity I followed with a “Write CSV”.

The robot works as required, extracting Date and Tweet body into Excel columns, EXCEPT that it captures only the first 120 Tweets or so (out of >1700).

The problem is, Twitter doesn’t display all the Tweets but selects a few more to display when the slider is moved (close) to the end of its “travel”. The robot evidently does not trigger this behaviour. Is there something that can be entered into the “NextLinkSelector” to keep displaying the Tweets or is an altogether more complex, procedural approach needed?


Data Scraping Activity - trigger "next page" when there is no explicit button
#2

YouTube uses the same mechanism as Twitter for displaying comments “under” a video. So, to extract all comments, a UiPath robot would need to, somehow, display a “next page” without a “next page” button. That two major Web sites are using the same “next page” functionality to display a list, makes me suspect the problem is generic and could crop up alot.

The following video has a (very) large number of comments so it’s useful for investigation and testing https://www.youtube.com/watch?v=aircAruvnKk


#3

I have two more Web sites that use the same mechanism to display a further “page” of a list (ie that have no explicit “next page” button). One is TweetDeck, the other is the UiPath forum search results!!

If one moves to the end of the search results using multiple Fn+End keystrokes and runs the robot, all the search results are captured. Otherwise, only the results that have been displayed are returned. Surely there’s a way of Data Scraping from one of UiPath’s own Web pages !!?


#4

Hi b4bbler,

For Twitter and YouTube you might get away by emulating scroll-down and the easiest way would be to send the Page Down hot key:
image

Might need to introduce a delay or wait for some element to allow list getting populated and you should be good.
This is on UI Automation side.

On the more advanced side, both twitter and youtube expose developer APIs ( https://developer.twitter.com/en/docs/api-reference-index and https://developers.google.com/youtube/code_samples#youtube-data-api-v3 respectively) which are meant to be used - so if this alternate solution sounds appealing, might as well go for it (though there would be a need to check for the limitations first - here are twitter’s API use limitations: https://developer.twitter.com/en/docs/basics/rate-limits

Hope this helps.


#5

I can’t see how the Send Hotkey activity can be used in conjunction with Data Scraping. Perhaps the answer is to display the whole list then use Data Scraping.

As there doesn’t appear to be a “display to end of list” key press, there would have to be a loop (i.e. “until the end of the list has been reached, display a set of list items by pressing Fn+End”). Is there a way of testing for a label that indicates the end of the list? In Twitter, the end of list is a “Back to top” link.

It would be ironic if there wasn’t a generic UiPath activity solution to scrape data from a standard UiPath list page (https://forum.uipath.com/search?q=screen%20scraping ) without resorting to such a workaround! Writing API calls to individual domains isn’t viable as a generic solution, imho, as it somewhat defeats the object of using an RPA tool.


#6

Can you post the xaml file?