Data Scraping of Images

datascraping

#1

Hi,
I want to get the details of an ecommerce website including images of all the products of all the categories. But image is not being scraped using data scraping with .png or .jpg format.
It is being scraped in .html format. So, what should be the optimized way for this process?

Thanks in advance :slight_smile:


#2

This is a good question and I’m having a similar problem.

I am scrapping a webpage where I can scrape about 95% of the data needed. However, the one item the scraper is struggling with is an image. The information the image has is valuable and the data is in it (I confirmed it because viewed the source of the webpage and the actual number that I need is present).

Is there anyway I can scape all this information together in one document? As opposed of doing it in separate documents and then merging them?

Looking forward to your response.


#3

Okay, I think I’m getting closer. I can see the source with Edit Data Definition. However, after I click okay, the data definition is not changing. How can I edit the change permanently? Here’s where I’m at.

Thanks!

edit data definition


#4

Did you try modifying at activity level after the Datascrape translates into an activity?


#5

Hi @vvaidya, thank you, this is great.

I’m still having problems extracting the stars in a review.
I was looking at the Advanced UI Automation, Lesson 5, video and it said (and showed) the following.

213
00:15:40,890 --> 00:15:49,459
each block: A title, a cover, a date,
the name of the author, the price, rating,

214
00:15:49,459 --> 00:15:50,950
and many others.

However, what do I need to do to the XML code to extract the data? Right now, I’m just getting blank columns like the one below. FYI, I’m doing this on a popular restaurant review website, not Amazon. This method does work on Amazon.

I feel like I’m so close, but there’s something I’m missing.


#6

Alright, getting a little further. Through the UiElement finder I discovered the rating option, but when I select it, it gives me the following error. Any ideas why I’m getting this error?


#7

You need the title? Try unchecking the css-selector and check the title.


#8

That worked for the title. Thanks! I’m still having problems executing the action that I want, getting the ratingValue of the review. But I think I’ll just do it in Python. Thanks for all your help.


#9

Python worked, but I don’t understand why UiPath recognized the image and not the number that was included in the html codes.

For example, it took this prop, which didn’t work because it’s an image. Data scraping just didn’t recognize it.

<div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">

And NOT this one. it didn’t do the hidden figure 4.0 which is imbedded in the html source code.

#<meta itemprop="ratingValue" content="**4.0**">

if anyone has any ideas of how to make this work on uipath, that would be great. I’m still puzzled on what the program can’t and can pick up.


#10

Hi vvaidya,
I am also trying to scrape images from an e-commerce website and unable to. After some google search , I stumbled upon this thread and was trying to figure out what exactly did you ask to modify at activity level? Below is my xml