Data scraping using selector

Hi All,

I am trying to extract customer feedback from Online consumer website like Rating of the customer for the product. Using Data Scraping, am extracting Customers Name, Date, RATING & review part from the Trustpilot website into ‘DataTable’ variable and then writing it to an Excel sheet.
URL: https://uk.trustpilot.com/review/bt.com?page=2

However, the Rating part extracted into Excel sheet showing as null(blank).

I have tried the same in other websites like Amazon and can fetch the ratings from the comments.
Not sure if this is due to UI element inconsistency in fetching the data from the website.

I have tried to update the selectors, however, am still not able to get the rating.
PLease find the screenshot below:

Hi @Sathish_S

You might need to tune it a bit manually. When you scrap the data, don’t finish the scraping and manually edit the XML. It will then refresh the output and hopefully at one point give you the results :slight_smile:

The editing strategy would be to remove 1 line at a time from the top:

However, this will only work if this bit is exactly the one as the property you want to get from the website. This can be checked by Inspect Element on the target website.

If the value of the attribute is stored as a class/id name, you will want to change attr=‘text’ into attr=‘className’.

EDIT
Actually, it is a bit trickier than I originally thought. The rating is actually just a color-change of particular element. I am not sure how to get it, actually. I hope someone will be intrigued by it and will solve it for us :smiley:

It is actually an interesting challenge.

2 Likes

Hi,
Many thanks for your response & giving a try.

Yes, the rating part has been captured with some scripting language and also includes the color code combination.

Awaiting for the solution. Please someone help me with this as I am trying to solve this from longer time.
Many thanks in advance.

regards,
Sathish

5 posts were split to a new topic: Trouble datascraping a website

I have already raised a concern on my issue. I don’t think it is appropriate to respond your new query in my existing one without resolving.
Anyways, I’am checking your issue,

Can you all please help me sorting my query which I raised in the above.

Many thanks for your help in advance.

Thanks,
Sathish

Hi @Sathish_S

You will need to extract the class property of the DIV that contains all 5 stars, like so:

And then run some basic post-processing to extract your value :slight_smile:

This would be the solution:

And the XML to get it:

<extract>
	<row exact="1">
		<webctrl class="review-list" tag="div" idx="1" />
		<webctrl class="review-card  " tag="div" />
		<webctrl class="review" tag="article" idx="1" />
	</row>
	<column attr="class" name="Column1" exact="1">
		<webctrl class="review-list" tag="div" idx="1" />
		<webctrl class="review-card  " tag="div" />
		<webctrl class="review" tag="article" idx="1" />
		<webctrl class="review__content" tag="section" idx="1" />
		<webctrl class="review-content" tag="div" idx="1" />
		<webctrl class="review-content__header" tag="div" idx="1" />
		<webctrl class="review-content-header" tag="div" idx="1" />
		<webctrl tag="div" idx="1" />
	</column>
	<column attr="text" name="Column2" exact="1">
		<webctrl class="review-list" tag="div" idx="1" />
		<webctrl class="review-card  " tag="div" />
		<webctrl class="review" tag="article" idx="1" />
		<webctrl class="review__consumer-information" tag="aside" idx="1" />
		<webctrl class="consumer-information" tag="a" idx="1" />
		<webctrl class="consumer-information__details" tag="div" idx="1" />
		<webctrl class="consumer-information__name" tag="div" idx="1" />
	</column>
</extract>

1 Like

Hi,

Thanks a lot for your response and solution. The solution you have advised is working fine for me.
Also, I am trying to extract the same set of data from a different website.
URL: https://www.reviews.co.uk/company-reviews/store/bt

In which, am facing a different kind of rating, even I have to change the attributes types to the class.

.

Please assist on this.

Many thanks for your help.
REgards,
Sathish

Hi @Sathish_S

I’m glad it worked out for the previous website. Unfortunately, for this second website it is most likely a bit more tricky (but not impossible).

This method might be slower, because there is no attribute of the div tag that contains the amount of stars in a review. This means that you will need to add the review star count after you have scraped the page from other data and it comes with a few additional challenges.

But I will focus just on getting the star review value.

The only way I see to determine the value of the review is to get children of the div encapsulating all the stars:


and then iterate through those children and count how many have a class attribute that is called:
icon-full-star-01
(because the empty star has a class called icon-empty-star-01)

It would be ideal if you could scrape the innerHtml of an attribute with the Data Scraping tool, but I don’t think this is currently possible.

I hope it helps you a bit.

@loginerror, I am facing similar problem. Can you make the solution more clear?

@ppr,
I have gone through the document of Data scraping. I am stuck with finding a way to scrap and get the star rating count from a website.Could you help for the same?
https://www.healthgrades.com/physician/dr-ruel-stoessel-2wlv9
This is the website.

this site is geo blocked. But however please keep the discussion in your origin thread. Here the initial approach was mentioned. See you there

@ppr, Fetch complete reviews for a set of doctors
This is my origin thread. The solution i got isn’t working for me.Please go through it.