Train 1 URL and then upload a list of URLs to scrape similar data

Hey,

I’m trying to scrape data from the website and ran into an issue. Obviously, I can scrape a list of URLs one by one, is there any chance I can train 1 URL and then upload the list of URLs somehow (.csv) so it will do the rest following the similar logic? Maybe I can create some sort of a loop for that?

I found that’s possible with Import.io but I need more URLs to scrape.

Thanks

Yes its do able. Only thing to note is that the tables on each of the URLs need to be similar in some sense so that the selector used for the Data Scrape work on all

Load URLs into a DT
Use a For Each Row for DT with the following nested {
Navigate to URL
Scrap table
Do what you want with the info
}

Can anyone help me with this? I’m stuck, should be pretty easy to do but I’m new to UI Path - not sure how to proceed. If there’s an article about that or a YT video would be highly appreciated.

Thanks

@Dima_Makei If you can provide a sample of the URLs you are working with I can take a look at putting something together for you

Hi @Jarzzz ,
I appreciate you getting back to me.

I’m attaching the list of URLs I’ve managed to scrape with UIpath. That was pretty easy. The tough part for me is to create a rule “loop” that will go to each of those URLs and scrape data like:

-Brand name
-Brand URL (actual brand’s website)
-Full copy from the [paragraph] elements
-Amount of reviews
-Ideally an image, but if that’s too complicated the above 4 is more than enough.

brands.xlsx (11.0 KB)

Thank you very much.

Lol a heads up would of been nice that these links could contain NSFW material

Any who… Ill provide a walk though of how to achieve this via amazon and hopefully you can figure out how to do the website you wish

haha, didn’t even know it could be NSFW, completely fine where I am lol

Thnaks!

@Dima_Makei
No worries at all!

Take a look at this workflow webpageScrape.xaml (22.9 KB) . Put it together real quick. You’ll need to change things but the functionality/logic is there. Take a look at the selectors and how I defined them for each URL, do the same for your URLs and make changes accordingly. Find selectors that stay true for each URL you visit.

Let me know if you have any questions on how/why things are done.


You can ignore the Build Data Table activity. Was going to add each URL results to a DT but never did =D