Data Scraping - Item Selection

Hi guys, I am new to UiPath and recently I just started on the basics of data scraping. I have a few questions and doubts to clarify, and I would greatly appreciate the input!

First off, I am trying to scrape an item name off a website, but it won’t let me select the item name. Instead, when I hover my mouse over the item during the scraping(Extract Wizard), it just selects everything as a whole, whereas on another website I can select and extract the individual components such as the item name, price, rating etc separately. My question is, how can I extract the data separately into my data table if I am faced with this issue again? I’ve added a picture below for your reference!

image

Another problem that I faced was that when I extracted the data from the website, some of the rows were empty and the information are incorrect(The column in B1 to B19 is supposed to be my price list), does it have to do with the pattern, if so how can I fix this problem?

https://ibb.co/BsBbKMC

Thanks for reading!

Hello @Conning

Welcome to UiPath community…!

Hope you are using Table Extraction. You can check the below video. Also will it be possible to share the url that you are accessing.

Thanks

1 Like

It seems that the website you are trying to scrape from has a structure that makes it difficult to select individual components separately using the Extract Wizard in UiPath StudioX. In such cases, you can use other scraping techniques to extract the data separately into your data table. Here are a few alternatives:

  1. Manual Extraction: If the website structure is not amenable to automated scraping, you can manually extract the data by using activities like “Find Element” or “Get Text”. These activities allow you to indicate specific elements on the webpage using selectors or other methods and extract the desired data. You can use multiple manual extraction activities to collect different components separately and populate them into your data table.

  2. Data Extraction with Regex: If the data you want to extract follows a specific pattern or has identifiable markers, you can use Regular Expressions (regex) to extract the relevant information. UiPath provides the “Matches” activity that allows you to apply regex patterns to text and extract specific data based on the pattern matching.

  3. Custom JavaScript Injection: In some cases, you may need to inject custom JavaScript code into the webpage to retrieve the desired data. This approach requires advanced knowledge of JavaScript and web scraping techniques. You can use the “Execute JavaScript” activity in UiPath to execute JavaScript code on the webpage and extract the data programmatically.

  4. Use Third-Party Libraries: If the website allows it and you are not restricted by any legal or ethical concerns, you can consider using third-party libraries or tools specifically designed for web scraping. These libraries often offer more advanced scraping capabilities and flexibility to handle complex website structures. However, ensure that you comply with the terms of service of the website and any legal obligations when using third-party tools.

It’s important to note that when scraping data from websites, you should always be mindful of the website’s terms of service, legal restrictions, and the ethical implications of data scraping. Make sure to respect the website’s policies and seek permission if necessary.

In summary, if the Extract Wizard is not able to select individual components separately, you can resort to manual extraction, regex, JavaScript injection, or third-party libraries to scrape the data and populate it into your data table. Choose the method that best suits the website structure and your specific scraping requirements.

1 Like

Hi @Conning

You are using the extraction wizard for data extraction it was the classic activities. Use the modern activities to scrap the data from this type of websites. It was the pattern extraction. First enable the modern design activities for the project. Take the Use application\browser activities then indicate the website application that you are automating. In that use the extract datatable activity to extract the data as a datatable. When you are indicating there is table settings option window. Click on the Add data option and click on the item name and change the column name. Then again click on the Add data option and click on the other like rating or cost of the product. Then it will extract for whole items automatically. If you want to extract for all pages. Click on the Next page option in the Extract datatable settings window, then indicate the next page option in the webpage.
Enable the modern design experience for using modern activities.

Follow the below workflow

Find the below image for the Extract datatable activity settings window.

Hope it helps!!

2 Likes