How to Scrape Data from a multiple page website and if there is no any Next Page Button

Hi everyone.

I want to scrape data from this page:
https://www.hepsiburada.com/kampanyalar/cok-satan-kitaplar?sayfa=1

Fortunately the url parameter ends with the page number. The image above is the first page.
I can loop through next pages with pageCounter variable that I created.
However write range activity works only in the first page.

Here is my workflow screenshot.
Annotation 2020-04-30 171410

Well any suggestion what to do? I think I am missing something. :slight_smile:

Thanks in advance .Cheers…

You can get number of last the page and put it into variable, for instance : lastPage
hhh
Then, create another integer variable(int start = 1), and use For Each activity to increment that variable until last page. For Each activity would be : for each item in Enumerable.Range(start, lastPage).
Of course, perform actions you need between switching pages.

I have put the for each item as you said. But it didn’t work. To be make sure again
where should I put the for each activity?

Here is the updated workflow screenshot.

In order to go to the next page, insert Click activity and indicate on the any page number.
Also, in the selector, which is most important thing - set idx attribute to pageCounter(start from 1)

Thanks for your answer. Clicking on the other page numbers worked perfect but still the write range activity is not working. It’s always empty after running on a couple of pages.
Do you mind if you take a look at the xaml file.
HepsiBuradaBooks.xaml (11.7 KB)
Thanks

Hi @sehz4d I checked your xaml file in this logic you gave a page number but in real time we don’t know how many pages are there.Actually you said 33 pages but now it is 34 pages so in real time it will increase or decrease the pages so we need to do dynamically.It did that check the my xaml file.

And one more thing did you face any issue in data scraping.actually i faced parentID issue it is generating new ID to every page.so for that i get parentID everytime and added to selector.
In this issue another solution is there remove the id and add the class but here technical issue is there sometime website will be change that time definitely classes will be change so for that i used parentID

Dataextraction.xaml (35.0 KB)

2 Likes

Hi @sehz4d,

I am attaching a workflow to help you to extract data you want from each and every sheet without worrying about absence of next button.
Approach

  1. Determine the last page
  2. Extract the data of current page
  3. Navigate to next page by manipulating URL.

You can change the logic to click on the particular page number element, I found current approach to be working fine.

_test.xaml (11.3 KB)

3 Likes

Use navigate to activity in the dynamic url and loop to the webpage

1 Like

Hey @pradeepRPA.
Thank you so much for your help. Finally I solved my problem with your solution. It’s working flawlessly now.

Yes. It was 33 pages at first.
It’s changed obviously. And with your solution it will not throw an error at all even if the total number changes.

I wasn’t be able to extract any data. As you noticed it’s generating new parentID in every page. I followed your every step and it’s working great.

Thanks again

1 Like

Hey @Ishmeet_Bindra

Thanks for your reply. Your solution is working great as well.But I had to edit parentID as @pradeepRPA mentioned in his reply. Thank you so much.

I have a question though.
image
Can you please explain the function explicitly that you used here?

Thanks in advance.

Thanks @ashishsinha1504 . At the beginning I was able to loop through the next pages while using pageCounter variable. but the issue was the new parentID at every page

@pradeepRPA. Because of your solution has click activity it is slower than @Ishmeet_Bindra 's solution .
Of course without your help I couldn’t solve the problem :slight_smile:

This function is simply extracting last number (last page) from previous extracted data.

1 Like

Hi @sehz4d

Here i used element exists so it will check the number it is useful to real time and i added delay also for that particular site sometime it slow.Anyway finally you got a solution that is good thing.

1 Like

Yes it’s working fine .Thanks

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.