I am trying to scrape data from website using extract data table.
The web page has data spanning multiple pages but it only gets data that’s on the screen. It does click the next page (next button at the bottom) and gets the data as well but only the one that shows on the screen. The test extraction process also shows data some times however it’s not consistent. Is there a way to fetch all data (either by sending pg down or end hotkey) inside the extract data table activity As I need to get all the data after each of “next” clicks? There are a total of 300 odd rows and the page only displays 25 at a time (there is no option to display more or less on the page)
I suppose the web page loads table data async. So can you try as the following steps?
First, put ForEach loop to repeat number of pages and put the following activities inside it.
Then put Send Hotkey to scroll till the end of the page.
Next, put ExtractData activity (usually using Wizard) and next link selector leave blank.
Put click activity to go to next page.
Hi Yoichi, Thanks a bunch for taking a look. I tried with ForEach as well, it still only picks up the data thats on the screen. While test extraction shows 25 rows, when I actually run, it only grabs the data shown on the screen.
This might not be very efficient way, however can you try double loop (one is for page, other is for page down) as the following, finally remove duplicated rows?
The following expression remove duplicated rows
dt = dt.AsEnumerable.Distinct(DataRowComparer.Default).CopyToDataTable()
Heres what i think
, i think first you need to ensure couple of things
1.how many page down pressed u need to get all the data
2.will there be any missed out data if u use page down
3.create condition , if next button is visible/found then dont do scrap data, instead press the button first then do the scrapping again,
note:check how many the next button exist and does it have some uniqe indicator in the selector,
carefull on making the process because if you dont make the parameters clear enough it will keep on scrapping the data infinitely,
and for the duplicaate data i think you can use YOICHI answear
Thanks Yoichi and Ahmad, Tried double loop but it gives error that next is disabled even after giving just (0,5) value. There are 13 pages in total each of them having 25 entries except the last page. If I reduce the font, I can fit in 18 entries in 1 screen before it clicks next. So out of 325 entries, it only gives 290 odd.
Also confused as to why Test Extraction would show correct entries but when it actually runs it wouldnt pick.
Welcome back to the community!!
Can you please tell the site/data you are trying to scrape so that we can have a look at the usecase.
Hi Shikhar - Its linkedin sales navigator.
Hi @Yoichi @Shikhar_Tandon @Ahmad_Rais Is it possible to scrape the data is some other fashion?