Data Scraping - multi layer?

Hi
I have got data scraping working and can happily bring back a list of books:

https://www.amazon.co.uk/s?k=rpa&i=stripbooks&ref=nb_sb_noss_1

However, if i wanted to click each individual item on this list and bring back lower level data, could that be done through data scraping. For example, each book has a Product Details field on the individual book web page that has number of pages data on…so i would like to bring page number in as a field.

Alternatively, some web items have abbreviated text where you have to click on the title to ‘read more’ of the description. Example:

https://www.jobsite.co.uk/jobs/rpa/in-london?radius=0

Any way of getting to this?

I have looked at tutorials/other posts but cant see anything that suits.

Thanks

Hi @ghdunn,

Welcome to the Community! Data scraping can only extract data that is currently loaded/available. The scenarios you have mentioned require you to navigate to different page(s) and/or load additional information. And that’s exactly what you will need to automate to be able to retrieve them. Hope this helps.

Venkat…hi…thanks…Can that be done within a loop using data from the noisy scrape. Or too complicated?

Gerald

Hello Gerald,

If it’s a simple use case, you can use a loop construct to drive the workflow based on data from the initial scrape.For complex ones, I suggest you look at REFramework

Hi @ghdunn,
Sharing one sample workflow based on your scenario and yes you can loop through the extracted URL while doing data scrapping and then do multiple works on the page as in extracting few other text like descriptions etc.

_Test.xaml (9.3 KB)

1 Like

Indrajit,

Many thanks…could you just help me further…how do I change this config?

![image|690x417]

(upload://j0ucyrKmCvNuJppGCTYX9DCybea.png)

Gerald

Hi @ghdunn,
Can you please tell me which version of Studio are you working with right now?

The missing activity being shown here is because of package version mismatch for UiPath.System.Activities. Try reinstalling this package again .

As far as the workflow is concerned please see the below snapshot.

I am using 2019.5.0 Community edition. Do i reinstall the package using 'Tools>Project Dependencies Mass Update Tool" ? It asks me for a project to put that in…I am not sure how to proceed.

I will try to emulate your screenshot and get it working that way…will let you know how it goes!

Gerald

Thank you Venkat…i will work through that.

I think i am very close…but not quite there

If i try to scrape an item on each of the underlying pages, the output comes back with the value “Text”

HI @ghdunn,

I have tried simulating the steps and i am able to extract the text for the “Paperback” value.
just double check the selector.
.I am using the following selector for get full text →


1 Like

Thanks Indrajit,

I think the Paperback object is a i different kind of object to the Paperback object I selected…I could only capture it as a Generic type. Anyway…got it working so thank you very much for your help.

Gerald

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.