Data Scraping activity is pulling URL and Data of only first Page, not of subsequent pages

Hello all,
In data scraping activity, I am pulling URL but it is pulling URL and Data of the first page only, very strangely it is happening with the field on which URL is checked otherwise it is pulling data perfectly. If I uncheck the URL of any field then it starts pulling data of all pages.

Thanks

Kindly ensure that these steps were followed

Cheers @Alok_Shrivastava

Hi,

I have taken all these steps several times.

Hi @Alok_Shrivastava,

Check the NextLinkSelector in the property panel, as it is not stable, probably the reason for not going to subsequent pages.

Hello ,

It is going to all the pages, Pulling all data rows but not the field in which URL is also checked. that too till UR is checked, as I uncheck URL, it starts pulling all rows.
here is a screen shot.

Thanks

After choosing url we would have been shown with a preview of the table we are going to get
Did that have those url
And ensure once that this option is chosen as yes once after having the preview
image

I hope we are missing some part of step in data scrapping no worries let’s try to solve this
Cheers @Alok_Shrivastava

Hello Palaniyappan,

I am sharing all the screenshots. One set is a process with “Deiatls field checked URL” and another is “Address field checked URL”. The outcomes in excel as well.

You can see when;

  1. Details was associated with URL, 2nd page is having blank field whereas Addresses are coming flawlessly.
    2.Address was associated with URL, 2nd page is having blank field whereas Details are coming flawlessly.

Hello Friends

Kindly help me. :worried:

Hello All,

Any help, Idea, suggestion.

One possibility would be that the selector for the table that you’re scraping on page 2 may not be the same as that what you modeled using Page 1.

Can you please do the following?
Navigate manually to Page 2 or Page 3, then run the Data Scraping wizard and compare the XML it generates with the one that you originally generated with Page 1.

It may lead to an answer.

I did post a response to another forum on an issue I resolved similar to yours. I’ll find it and post it here.

And here is the response from an earlier post.

Hey @Alok_Shrivastava

I totally agree with this -

I have run into a few such scenarios where data scraping wizard might not be your universal solution. You might have to tweak around a little bit.

Scrape the data from page 1 and page 2, for example and compare the Data Definition. (you can get it like shown below )

I am pretty sure you will notice some differences. If the differences are simple, you might just fix it with wild card characters. If not, you might need to perform separate extractions and merge them, whichever works best.

1 Like

Hello @kaderms @AndyMenon

Here is the data definition of page 1
image

Here is the data definition of page 2

Here is a comparison line by line

I could not find any difference.:slightly_frowning_face:

the point of attention is it happens only when the URL option is checked. you can see in screenshots.

Thank you very much

Hi @Alok_Shrivastava,
Is itpossible for you to share URL so that i can check from my end.

Hello @Jyotika_Halai

Sure
https://www.rightmove.co.uk/ -> Commercial -> Commercial property to rent ->
enter ‘B1’ -> click ToRent -> click Find Properties

I need Address and its URL

Thanks

Hi @Alok_Shrivastava,

Kindly Check the attached file.
RightMoveCommercial.zip (19.9 KB)

Regards.

Thanks @Jyotika_Halai

In first page I am also getting URL and Data, the problem starts from 2nd Page.

There 69 records in the commercial section of rent in B1 postcode.
Please try to capture all 69 rows which are in 3 pages (24 each)

Thanks

Hi @Alok_Shrivastava,
I am able to scrape the data from three page individually.Please check attached file.
Data2.zip (17.3 KB)
I have followed the below steps:

  1. open the website and enter the required input to search data for B1 commercial
  2. load the page from that you want to scrape data
  3. Run the code
  4. It will scrape the data from that page.

I hope this will help you.
Cheers
Jyotika

Hello @Jyotika_Halai,

Thanks a lot.

Yes, if we load page individually and run the code it will work. But it should work automatically for all the pages. No matter there are 3 pages or 100 pages, It should crawl through all the pages and capture all the data. It is what I need. There are 130+ postcodes. It is not possible to load each page of every postcode and capture the data. I do not want to do this repetitive job that is why the robot is required.

kindly try to pull the data and URL in a single process. Where robot should press the Next button and capture all the data with URL till the last page.

Thanks again.