Table extraction not scraping all data and pages on Website

Hi All,

The Table/Screen scraping is not working.
The wizard recognizes some but not all elements, and only goes to page 14 of 90.

Any advice?

Thank you

@jaco

Did you try with using simulate in the options

Also is there any change…if not simulate should solve the issue

Cheers

Hey, could you try n use delay between pages ( 2sec) or check maximumm output property is not set to 100, which would allow to read only 100 records.

Thanks.

Hi @prateek.mehandiratta9 and @Anil_G

I have done that, and still no luck
Simulate and 2 seconds delay.

@jaco

Please use strict selectors…may be you are using image and fuzzy also…so it might not be reiable

Cheers

Hi @Anil_G

Still not scraping all data

Only checking, but why does the table extraction wizard not recognize this section?
I’m trying to see if there is different elements between those that have been recognized and those who have not.

@jaco

Please change the selector and check…looks like both have different classes or so…use similar ones

Cheers

Hi @jaco

Can you verify if Next Link selector is changing from 13th to 14th page.

Thanks.

If you haven’t already done that, can you temporarily set the “Continue on Error” property to false to check whether there’s an error being thrown by the activity?

image

Hi @Irene ,

The flow stopped at page 2 of 90 when this is turned to False. But no error thrown.

Hi @prateek.mehandiratta9 ,

Yes, it goes from 1 to 14 per normal.

then you could match selectors between 14 & 15th page

Thanks for the confirmation!
I’ve tried doing some testing on Property24 and it really looks like the selectors are not consistent and the structure of the HTML is not necessarily linear, so not easy to use Extract Data Table to scrap all the data across all pages. I’m afraid that the only option will be to dig deeper in the selectors used to identify all elements that are relevant. During my tests, I “played around” with the metadata in the Extract Data Table activity and I couldn’t get any clean and comprehensive results: all I could come up with was a lot of dirty workarounds (i.e., scrapping a lot of unnecessary data and then removing it from the DataTable with a loop or with a regex or with linq), but none of them was brilliant and clean.

Thanks @Irene , do you mind sharing your flow perhaps?
I do not have regex or Linq experience.

If I can scrape all data to excel, I can then also try to clean it in excel.

Will this query be helpful to UiPath devs in upgrading/Updating the table extraction?

Can you check if this approach would work for you? Note that I barely tested it, so I’m not sure if it works :slight_smile: but it’s just to show you an alternative approach that might be more reliable than the Extract Data Table activity for this site.
ICA_Property24.zip (7.2 KB)

The idea is the following:

  • Use the Extract Data Table to scrap the URLs of each item of 1 page only
  • Loop through the retrieved listings
  • For each listing, extract the relevant info by using the extracted URL as a variable in the selector (in my test I only extracted the Price, but you can use the same approach for the other info):
  • If the next button exists, move to the next page and repeat the same procedure

Again, this is just an idea, and you’ll have to add all additional “Get Text” activities to retrieve the remaining info. The main points to retain are the following:

  • This approach allows you to better control the selectors used, so it’s a lot more reliable
  • Instead of using the URL of the listing, you can maybe extract the URL of the image of each listing, and then use it as an anchor to get each property

Hi @Irene ,

This makes sense yes, Ill have a look at it today and potentially reply back tomorrow.

Thank you very much.

Hi @Irene ,

Thank you, I added the title, will do the rest soon.
I see the flow do not extract the last page.

In your example, there are 3 pages, but I only getting 2 pages.

Im looking in to that now.

1 Like

Yeah, I did the project more like a “proof of concept” and didn’t really test the selectors, I think that the one for the “Next” button is not correct, but as long as the approach works for you, then you can readapt it to your case. Let me know if you need more help. :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.