Web scrape with sublinks

Hello data scraping gurus,

I need to scrape all information for a list of light bulbs in multiple websites. This means accessing each of the 3 URLs provided below, scraping the initial landing page to get a list of sub URLs to loop through, and then automate the scraping on those pages. My idea is each URL would need to have its own workflow, which is fine.

My problem is, I cannot perform the initial data scrape to get the sub links for each light bulb/product on the primary URL landing page. . . I’m not sure if this just isn’t possible with these specific sites or if there is something wrong with my UiPath/data scraping wizard. I have un-/re-installed the extension on chrome and IE, restarted everything, and still have trouble getting this first dt of sub URLs.

Any ideas or fixes for this would be greatly appreciated!

http://www.westinghouselighting.com/light-bulbs/fluorescent/
https://www.menards.com/main/electrical/light-bulbs/fluorescent-tubes/c-7478.htm
https://www.acehardware.com/departments/lighting-and-electrical/light-bulbs/fluorescentcfl-bulbs

Many thanks,
Shelby

Hi @Shelby_Pons

Have you enable the URL column?
Click Proper Details Click that watt rows and enable that url check box

Regards.
Gulshiiyaa

3 Likes

@Shelby_Pons I have managed to Scrape the Data from the First Web Page, but the initial Step that i have done was Click on Show Details, then use Data Scraping on the Whole Row. I was able to get All 46 records, Can you try in that way and check if it works. I will send you the Workflow if you are not able to get data as I have to tidy the Workflow a bit.

2 Likes

Hello @Shelby_Pons

what @gulshiyaa said we can get url so using that url navigate to sub url and get the text or need table use data scraping.

http://www.westinghouselighting.com/light-bulbs/fluorescent/

I used above link to extract data,navigate to sub url and i extracted the data.I am sharing Workflow and excel sheet check it once.

WebScrapeWithSublinks.xaml (18.1 KB)

westinghouselighting.xlsx (8.1 KB)

2 Likes

Thank you, Gulshiyaa!

Whenever I try to use the data scraping wizard on this website you used, i get this result:
image

No matter where I select (whole box with all text or just URL), it just returns the “whole” data table as one swimlane. Do you think this could this be an issue with my browser or similar?

Thank you!

Similar to issue I mentioned to Gulshiiyaa above, for some reason, when I use the data scraping wizard, it only pulls information from the first swimlane (only one item). I’ve selected the first piece of info (Item #) and the whole swimlane to see if it would allow me to click the second one and capture all of them. Also have used both Chrome and IE to try this. Do you have any advice on why this isn’t working?

image

Thank you!

Hi Pradeep,

Thank you for sending the files! I will try to replicate the scraping of sub-URLs and report back if I was able to recreate your workflow.

Thanks!

1 Like

Hi @Shelby_Pons

I understand your problem Its not your browser issue

image

—> Dont click yes instead of yes click No

image

—> Then Click Next

—>Then click the 2nd row and check the Extract URL

—> Next You will get the result if you want full data Then Click Below opition

—> Next click other data

like that you have to do for next data then you will get all data which you want to scrape

After scraping all the data click finish button

Then next pop up message will come If you want to scrape multipule page click yes

image

And indicate Next button

Thats it

And one more point i want to Add here

After Finishing all the data scarp in property panel you can see maximum numberof result default it will 100 which means it will scrape only 100 data’s but if you want whole site data then you have to give 0 so it will fetch all the data belongs to that particular site

Thank You,
Regards,
Gulshiyaa

4 Likes

Hi Gulshiyaa,

Thank you so much. Clicking “no” on the initial pop-up was what was confusing me.

Thank you!!
Shelby

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.