What's the best way to iterate through webpages with a variable and while loop?

I need to iterate through all the pages from a website and scrape some data. I can’t use the next button (in the extract data activity) because the next arrow disappears on the second to last page (screenshot 2). I’m able to iterate through all the pages with the ‘PageIndex’ variable used in the selector of the Click acitivity but how do I exit the while loop without using a ‘check app state activity’? Appreciate the help.

Hi!

Have you tried with navigate to?

Just check the each page URL. And notice which part is changing…

If you can able to find that store changing part in temp deta table…

Iterate through each url

Use Navigate to pass the url with modification…

This will iterate through each url and navigate to the each page this won’t give you the selector issue also

Try in this way

Regards,
NaNi

Hello, thank you for the quick reply! I have indeed tried with a navigate to then a check app state in a while loop to check if the page is still viable to scrape and it works but it’s really slow. Also I really want to understand how I can use the method I’m currently trying. I want to upload my xaml so you have the overview but I can’t since I’m a new user.

Hello.

Inside the while, I added a validation to know if the next page exists, like an element exists.
If it exists, click. If it doesn’t exist, it’s over.

You can also try using navigate to.

Hug

Hi @L2RPA,

You can use do While instead of While. Do While will allow you to continue as long as the condition is met. As a condition, if you know how many pages you will throw before you start datascraping, you can say keep scraping until it is less than that number as a condition.

Regards,
MY

Hello, yes that’s a possibility with a ‘check app state’ (modern acivity for the element exists) but it’s very slow when I use it and I was wondering if there was another way.

This is also a solution but I want to keep it dynamic, so that if they add more pages they will still all be scraped. Something like that the pageIndex variable searches for a next page and when it can’t find another page to click it exists to loop. I don’t know if I make sense :sweat_smile:

image

If you can implement this note it will still be dynamic. In general, they write the total number of files etc. to the tables. If you divide this by the total number of rows on each page, you can find the dynamic page count.

Of course I don’t know if this is on your system, you can check it :slight_smile:

The last page isn’t fully filled with rows like the other pages so I don’t think it will work?

You can overcome this with the mod and round functions.

Get mod first, accept partition if 0. Add +1 if greater than 0.

Ok I’ll try, thank you so much for your help!


Just wanted to give an update. I surrounded the click activity with a try catch so when it doens’t find a next page to click it exits the loop.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.