Extract different data from website - hotels.com

I want to extract data from a website. I can extract first data but when try extract second it shows can’t find pattern, elements differs by tag. I tried to extract price. I think as there is a discount price that’s why it shows error. How to solve it?

@Mahabub_Hasan
Would it be possible to share the link with us? Thanks

www.hotels.com. I will extract hotel name and corresponding price

@Mahabub_Hasan
the reduction inserts another element, which is blocking the regular price.
After manualy editing the extract data, following retrieval can be realized:

grafik

grafik

<extract>
	<row exact="1">
		<webctrl tag="li" class="hotel"/>
		<webctrl tag="article" idx="1"/>
		<webctrl tag="section" class="hotel-wrap" idx="1"/>
	</row>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="div" class="price" idx="1"/>
	</column>
	<column exact="1" name="Column2" attr="text">
		<webctrl tag="div" class="price" idx="1"/>
	</column>
</extract>

Still it is showing error. Can you please do this for this page

https://www.hotels.com/search.do?resolved-location=COUNTRY%3A10233082%3AUNKNOWN%3AUNKNOWN&destination-id=10233082&q-destination=Italy&q-check-in=2020-11-30&q-check-out=2020-12-05&q-rooms=1&q-room-0-adults=2&q-room-0-children=0

And to extract data you used data scraping activity, right? Sorry I am new in this field that’s why taking time to understand.

exrtraction was also working with the same extract config for the second link

what was displayed?

yes , datascraping was used.
Ensure that the selector of datascraping activity is general and reliable enough that it is woring also with other search results

It is now working. Thanks btw is it possible to extract only the discounted price when there is a discount in any hotel and also to remove the text?

image

@Mahabub_Hasan
extracting the price or reduced price will be against the principle of keeping strict structures. But we can trick as following:
grafik

with this config:

<extract>
	<row exact="1">
		<webctrl tag="li" class="hotel"/>
		<webctrl tag="article" idx="1"/>
		<webctrl tag="section" class="hotel-wrap" idx="1"/>
		<webctrl tag="aside"/>
		<webctrl tag="div" class="price" idx="1"/>
	</row>
	<column exact="1" name="OriginPrice" attr="text">
		<webctrl tag="a" class="price-link" idx="1"/>
	</column>
	<column exact="1" name="UnreducedPrice" attr="text">
		<webctrl tag="a" class="price-link" idx="1"/>
		<webctrl tag="div" class="strike-tooltip-block" idx="1"/>
		<webctrl tag="del" class="strike-through-price widget-tooltip widget-tooltip-multiline widget-tooltip-tl" idx="1"/>
	</column>
	<column exact="1" name="Reduced" attr="text">
		<webctrl tag="div" class="price" idx="1"/>
		<webctrl tag="a" class="price-link" idx="1"/>
		<webctrl tag="ins" idx="1"/>
	</column>
</extract>

and later we can consolidate it by cleansing the data (Regex, check if it is empty value etc)

ok. Another thing if I make search for dubai the extracted data are not maintaining serial.

Sequence4.xaml (12.2 KB)

often is this an indicator that the column definitions are not within a strict structure. Just explore the element structures more in detail and setup column extract structure selectors strict on the base of the row definition.

However the principles are demonstrated. For more advanced extraction scenarios also have a look here:

Thanks for your help.

Thanks for your help.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.