Building a scrape bot

Hi!

Apologies if this is a very basic question. However, I thought I would ask in the forums as I’m not 100% sure UI Path can do what I want and wondered if you could help.

I’ve been building a scraping tool that extracts data from our customer’s websites, we use this data as part of a back-end processing function for them. I’ve been using the Scrapy framework, 50% of the time it works every time. :slight_smile:

The problem: Scrapy uses Xpaths, so we build up the elements we want to scrape. This works great until the customer changes their website, turns out this happens quite a lot.

I’ve turned to RPA as I think I want to see if we can build a bot that extracts data based on attributes not paths. For example, on the website we are only really interested in a set of core things:

  • Size
  • Colour
  • Type, etc

Our customers websites universally have these, along with an image.

So the question is, can I use UI Path to tell a bot to head to a website, look for these things and extract the data?

Thanks in advance and sorry if this is a stupid question.

Hi @B3ndy of course you can scrape these details by UiPath.

use data scrapping

Thanks for the reply!

I guess my question is a bit vague.

What I want to do is to build a bot that knows that a website may have elements called colour, size etc, look at it, scrape it. I don’t want to have to build a separate scrape tool for each site.

Hi @B3ndy,
You can use UiPath bot as a scrapper. before going to solution
here I will mentioned.
→ as you mentioned if the changes occur in website it is chance that bot requires to change UI Element. In this case you will need to repair robot.
→ Dynamic UI Element can become cause of failure of bot.

Hello @B3ndy

Does that mean the entire layout of the Table also can get changed? I think you can try Table extraction which is the advanced method.

Also whenever there is a failure, just try to cross-check the selector. If there is anything varying, then you need to make it stable by using wildcards.

Thanks

That’s a good idea, I’ve not used that before. Often these websites use templates that get changed at short notice. We have thousands of sites we scrape so going though the errors isn’t really appropriate.

1 Like