Define properly NextlinkSelector to scrap data from multiple pages


#1

Hi !

I’m trying to scrap data from multiple pages but UI path seems to stay on the first page of my results.
The amount of results may vary depending on the period the bot is looking for so i have to found a dynamic way to grab all results…

I already searched with UIexplorer and also in the html code without any result.
It seems like it cannot find the selector to move to the next page and i’m kind of blocked as i don’t understand where i make a mistake.

Did anyone already experienced this ?

Here is the site where i try to scrap data from:
Curia

For the selector to navigate through the pages i tried several options but i never get any result. I think the most accurate must be that one:

<html title='CURIA - List of results' />
<webctrl parentid='mainForm:*' parentname='mainForm' idx='1113' />

Thanks in advance !


#2

Hey @JFK,

I was able to scrap the data from both the pages of the link that you provided.

I used below element to move to the next page. Do you mean that robot doesn’t click on this element when you run it?
image

Am i missing something here?

Thanks,
Rammohan B.


#3

Hi @Rammohan91,

Yeah indeed, even if i write the code in the NextlinkSelector it seems like the bot doesn’t go further and only take the data from the first page. I tried with the “usual ui selector” from the wizard and with self made code.

Normally the code i wrote should refer to the same arrow as you, i really don’t understand what the problem is. I even checked the amount of data to scrap and put 0 as value to make sure it would take everything.

Did you took my code for the selector ? I suppose i miss something there…

Could you share me your code ?

Thanks !


#4

No. I just followed data scraping wizard.

Here is my workflow.

Curia_Test.xaml (8.6 KB)


#5

@Rammohan91 i got it !

Apparently in chrome it’s working properly, the issue probably comes from internet explorer i had to use by default…

Thanks a lot ! I’ll see if i there is a workaround otherwise i’ll switch to chrome for all my bots :slight_smile:


#6

It seems i’ve been too fast…

Apparently the selector for the data scraping is changing depending on the search performed.

The arrow to the next page can be define like this in the link i provided:
<webctr class='btn_pagination' parentid='mainForm:j_id269' />

The problem is that if the search criteria change the parentid will also change… I then tried to use that selector:
<webctr class='btn_pagination' parentid='mainForm:j_id*' />

But then the problem is that the bot cannot make any difference between all the navigation arrows…

Does anyone got something similar ?

Thanks !


I found a solution, apparently writing on the forum helps to think out of the box :slight_smile:

I answer to myself in case it could help someone else.

It’s important to use exclusively the uiexplorer when trying to determine a selector. I thought it was not possible to use the html title and that the elements are limited to the class, id or “balise” but other properties fit as wel !

Nevertheless the syntax is not totally the same so it’s important to pass trough the UI explorer. If the ID of the selector vary, there are other elements that must be unique like their name or title that’s displayed on screen and that we can find back in the html code.

It’s then possible to isolate the selector based on that by searching those properties in UI explorer to make sure there are correctly written.