Data Scrapping an daily bases from a website that changes the classes names every day

Hello,

I have made a video to check on prices from a website on daily bases, the thing is that the first day it ran perfectly multiple times. But the next day it couldn’t find some elements, i checked the XML an found that the website changes a sequence in classes names.

As shown in the images the class name has changed from day to day.
Any idea how to fix this issue or if there is a work around like instead of ‘String-002244’ if we can use something like this ‘String-*’ to be able to find the XML elements without having to Scrap all the data all over again.

Thanks for your time :slight_smile: !!

@Eddy_El_Rahi
Welcome to the Forum

Give atry on replacing this dynamic part within the class with a * (Wildcard)

Hello ppr,

I have tried to use wildcard () but the robot couldn’t find the element for some reason,
example of what i have used (old script) ‘classname123’ the robot was working perfectly ,( new script ) 'classname
’ but the robot lost the element.

Is there any other way to replace or ignore dynamic parts in XML

@Eddy_El_Rahi
the described approach is the common approach. In some tricky cases the regex selectors can help out.
lets do one thing: use the </> button from editor and share the entire selector including the wildcard with us. thanks

This Is the code i have tried the wildcard (*) trick in the lowest price column but it didn’t work for me, notice that the hotel name is being extracted successfully every day because it doesn’t have dynamic class names.

please check your post cannot see any selector.

<extract>
<row exact='1'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
</row>
<column exact="1" name="Hotel Name" attr="text">
	<webctrl tag="li" class="hotel-item item-order__list-item js_co_item"/>
	<webctrl tag="div" idx="1"/>
	<webctrl tag="article" class="item bg-white" idx="1"/>
	<webctrl tag="div" class="pos-relative item__wrapper" idx="1"/>
	<webctrl tag="div" class="item__flex-column" idx="1"/>
	<webctrl tag="div" class="item__details item__details--layout" idx="1"/>
	<webctrl tag="div"/>
	<webctrl tag="h3" class="m-0" idx="1"/>
	<webctrl tag="span" class="item-link name__copytext" idx="1"/>
</column>
<column exact='1' name='Lowest Price' attr='text'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
	<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
	<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
	<webctrl tag='article' class='accommodation-list__cheapest--18cc5 accommodation-list__article--7e948' idx='1'/>
	<webctrl tag='div' class='accommodation-list__prices--96830' idx='1'/>
	<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
	<webctrl tag='span' class='accommodation-list__deal--0ecf2 accommodation-list__deal--96362' idx='1'/>
	<webctrl tag='span' class='accommodation-list__price--*' idx='1'/>
</column>
<column exact='1' name='Lowest Source' attr='text'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
	<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
	<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
	<webctrl tag='article' class='accommodation-list__cheapest--18cc5 accommodation-list__article--7e948' idx='1'/>
	<webctrl tag='div' class='accommodation-list__prices--96830' idx='1'/>
	<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
	<webctrl tag='span' class='accommodation-list__deal--0ecf2 accommodation-list__deal--96362' idx='1'/>
	<webctrl tag='span' class='accommodation-list__partner--869af' idx='1'/>
</column>
<column exact='1' name='Other Price' attr='text'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
	<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
	<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
	<webctrl tag='article' class='accommodation-list__specialRate--a00b0 accommodation-list__article--7e948 js_co_link' idx='1'/>
	<webctrl tag='div' class='accommodation-list__prices--85dc9' idx='1'/>
	<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
	<webctrl tag='span' class='accommodation-list__deal--54e08 accommodation-list__deal--96362' idx='1'/>
	<webctrl tag='span' class='accommodation-list__price--8f92e' idx='1'/>
</column>
<column exact='1' name='Other Source' attr='text'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
	<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
	<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
	<webctrl tag='article' class='accommodation-list__specialRate--a00b0 accommodation-list__article--7e948 js_co_link' idx='1'/>
	<webctrl tag='h3' class='accommodation-list__heading--7785d accommodation-list__heading--ec283' idx='1'/>
</column>
<column exact='1' name='Rating' attr='text'>
	<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
	<webctrl tag='div' idx='1'/>
	<webctrl tag='article' class='item bg-white' idx='1'/>
	<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
	<webctrl tag='div' class='item__flex-column' idx='1'/>
	<webctrl tag='div' class='item__details item__details--layout' idx='1'/>
	<webctrl tag='div' class='item__name item__name--link' idx='1'/>
	<webctrl tag='button' class='reviews reviews--hover' idx='1'/>
	<webctrl tag='span' class='review' idx='1'/>
	<webctrl tag='span'/>
	<webctrl tag='span' class='item-components__pillValue--8a352 item-components__value-sm--ed35c item-components__pillValue--8a352' itemprop='ratingValue' idx='1'/>
</column>

@Eddy_El_Rahi
It was too early in the morning and I did not noticed that it is related to datascrapping extractmetadata xml.
Sorry for this

In this case I would suggest following strategy:

  • as you are working much with idx, remove the class info from the selectors where it has a dynamic portion
  • if the selectors are to make more reliable, hava an analysis in UiExplorer for other selector attributes

just start with this, if it is failling, then we will try to simplify the selectors more and more, if it is needed

Ok man, i will try my best and inform you in case of success.

1 Like

Thanks Peter you are the best man :slight_smile: it worked for me i removed the dynamic classes and relied on idx @ppr

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.