Hello,
I have made a video to check on prices from a website on daily bases, the thing is that the first day it ran perfectly multiple times. But the next day it couldn’t find some elements, i checked the XML an found that the website changes a sequence in classes names.
As shown in the images the class name has changed from day to day.
Any idea how to fix this issue or if there is a work around like instead of ‘String-002244’ if we can use something like this ‘String-*’ to be able to find the XML elements without having to Scrap all the data all over again.
Thanks for your time !!
ppr
(Peter)
July 14, 2020, 6:44am
2
@Eddy_El_Rahi
Welcome to the Forum
Give atry on replacing this dynamic part within the class with a * (Wildcard)
Hello ppr,
I have tried to use wildcard () but the robot couldn’t find the element for some reason,
example of what i have used (old script) ‘classname123’ the robot was working perfectly ,( new script ) 'classname ’ but the robot lost the element.
Is there any other way to replace or ignore dynamic parts in XML
ppr
(Peter)
July 14, 2020, 7:29am
5
@Eddy_El_Rahi
the described approach is the common approach. In some tricky cases the regex selectors can help out.
lets do one thing: use the </>
button from editor and share the entire selector including the wildcard with us. thanks
This Is the code i have tried the wildcard (*) trick in the lowest price column but it didn’t work for me, notice that the hotel name is being extracted successfully every day because it doesn’t have dynamic class names.
ppr
(Peter)
July 14, 2020, 7:52am
7
please check your post cannot see any selector.
<extract>
<row exact='1'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
</row>
<column exact="1" name="Hotel Name" attr="text">
<webctrl tag="li" class="hotel-item item-order__list-item js_co_item"/>
<webctrl tag="div" idx="1"/>
<webctrl tag="article" class="item bg-white" idx="1"/>
<webctrl tag="div" class="pos-relative item__wrapper" idx="1"/>
<webctrl tag="div" class="item__flex-column" idx="1"/>
<webctrl tag="div" class="item__details item__details--layout" idx="1"/>
<webctrl tag="div"/>
<webctrl tag="h3" class="m-0" idx="1"/>
<webctrl tag="span" class="item-link name__copytext" idx="1"/>
</column>
<column exact='1' name='Lowest Price' attr='text'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
<webctrl tag='article' class='accommodation-list__cheapest--18cc5 accommodation-list__article--7e948' idx='1'/>
<webctrl tag='div' class='accommodation-list__prices--96830' idx='1'/>
<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
<webctrl tag='span' class='accommodation-list__deal--0ecf2 accommodation-list__deal--96362' idx='1'/>
<webctrl tag='span' class='accommodation-list__price--*' idx='1'/>
</column>
<column exact='1' name='Lowest Source' attr='text'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
<webctrl tag='article' class='accommodation-list__cheapest--18cc5 accommodation-list__article--7e948' idx='1'/>
<webctrl tag='div' class='accommodation-list__prices--96830' idx='1'/>
<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
<webctrl tag='span' class='accommodation-list__deal--0ecf2 accommodation-list__deal--96362' idx='1'/>
<webctrl tag='span' class='accommodation-list__partner--869af' idx='1'/>
</column>
<column exact='1' name='Other Price' attr='text'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
<webctrl tag='article' class='accommodation-list__specialRate--a00b0 accommodation-list__article--7e948 js_co_link' idx='1'/>
<webctrl tag='div' class='accommodation-list__prices--85dc9' idx='1'/>
<webctrl tag='button' class='accommodation-list__button--b8d61' idx='1'/>
<webctrl tag='span' class='accommodation-list__deal--54e08 accommodation-list__deal--96362' idx='1'/>
<webctrl tag='span' class='accommodation-list__price--8f92e' idx='1'/>
</column>
<column exact='1' name='Other Source' attr='text'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
<webctrl tag='section' class='accommodation-list__prices--505b9' idx='1'/>
<webctrl tag='div' class='accommodation-list__row--8f2f6' idx='1'/>
<webctrl tag='article' class='accommodation-list__specialRate--a00b0 accommodation-list__article--7e948 js_co_link' idx='1'/>
<webctrl tag='h3' class='accommodation-list__heading--7785d accommodation-list__heading--ec283' idx='1'/>
</column>
<column exact='1' name='Rating' attr='text'>
<webctrl tag='li' class='hotel-item item-order__list-item js_co_item'/>
<webctrl tag='div' idx='1'/>
<webctrl tag='article' class='item bg-white' idx='1'/>
<webctrl tag='div' class='pos-relative item__wrapper' idx='1'/>
<webctrl tag='div' class='item__flex-column' idx='1'/>
<webctrl tag='div' class='item__details item__details--layout' idx='1'/>
<webctrl tag='div' class='item__name item__name--link' idx='1'/>
<webctrl tag='button' class='reviews reviews--hover' idx='1'/>
<webctrl tag='span' class='review' idx='1'/>
<webctrl tag='span'/>
<webctrl tag='span' class='item-components__pillValue--8a352 item-components__value-sm--ed35c item-components__pillValue--8a352' itemprop='ratingValue' idx='1'/>
</column>
ppr
(Peter)
July 14, 2020, 8:51am
10
@Eddy_El_Rahi
It was too early in the morning and I did not noticed that it is related to datascrapping extractmetadata xml.
Sorry for this
In this case I would suggest following strategy:
as you are working much with idx, remove the class info from the selectors where it has a dynamic portion
if the selectors are to make more reliable, hava an analysis in UiExplorer for other selector attributes
just start with this, if it is failling, then we will try to simplify the selectors more and more, if it is needed
Ok man, i will try my best and inform you in case of success.
1 Like
Thanks Peter you are the best man it worked for me i removed the dynamic classes and relied on idx @ppr
system
(system)
Closed
July 18, 2020, 8:08am
13
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.