XML metadata and output

Hello, and my first post here.
just started as RPA dev. and still exploring the UIpath. however i have a problem with Data scarping
and website for flights.

really no idea why XML is outputting both results posted in 1 column instead of 2 in that example posted, it should show origin airport and destination airport. however the value comes as 
Also the date + times + flight prices comes looking terrible and can’t find pattern

Since it’s a public site you’re working with, would you mind sharing a link? That makes it easier for us to help identify workable selectors.

Just looking at what you posted though, it seems like the name=‘org’ is the origin (London Gatwick) and the name=‘dest’ is the destination (Hurghada)

It isn’t storing it in a nice & neat html table though, which I believe is what the data scraping tool expects. What happens if you just use uipexplorer to look at the individual selectors yourself? Are you able to identify something useable that way?

seems all the problem with the XML metadata.
when i use

It gives me the origin Airport and export it fine to excel however if i add another column tags
it return empty result

https://www.easyjet.com/ u gotta do search flight origin + dest + 2 dates to get to same page i’m trying to scrap

Here is what I would recommend using as selectors for each circle. Note that they are not full selectors, as i’ll assume you have it within a browser scope.

Outbound Flight, Origin: <webctrl parentid='outbound-flight-panel' tag='SPAN' class='origin-name' />

Outbound Flight, Destination: <webctrl parentid='outbound-flight-panel' tag='BUTTON' class='ej-link-button destination-name' />

Outbound Flight, Date (NOTE: This shouldn’t be necessary as you would already have this data as it is required to navigate to this page): <webctrl tag='H4' class='day-title' parentclass='flight-grid-day' idx='2' />

Outbound Flight, Previous day Flights: <webctrl tag='DIV' parentclass='flight-grid-day' idx='2' />

Outbound Flight, Current Day Flights: <webctrl tag='DIV' parentclass='flight-grid-day' idx='3' /> NOTE: This gets the whole box. You could then use string manipulation to choose individual departure/arrival times, cost, identify lowest fare, etc. This is what I would recommend instead of getting individual selectors since getting the whole box text like this will make your robot more resilient.

If you also want outbound flight, next day flights - it is the same selector, just change the idx to 4 (2 = previous day, 3 = current day you searched, 4 = next day)

flight origin & Destination is exactly how i’m doing it.
if i do single column it works but when i do 2 columns it doesn’t work

<column  exact='1' name='org' attr='text'>
	<webctrl parentid='return-flight-panel' tag='SPAN' class='origin-name' parentclass='flight ej-text' />
<column  exact='1' name='dest' attr='text'>
	<webctrl parentid='return-flight-panel' tag='SPAN' class='destination-name' parentclass='flight ej-text' />

i will give a go at prices, but even simple header isn’t working and i know the issue is in XML

As I stated before, the data scraping tool expects a nice & neat html table. Since easyjet doesn’t work that way, the data scraping tool should not be used. You should use individual activities such as ‘get text’ and use the individual selectors like I have posted.

so let’s say i’m gonna scrap dates / times / prices for like 3 months of flights ?
i can’t do that with the scrapping tool ?
do i have to do everything individually ?

and about my XML format ! does that look normal as i posted above or am i missing something ?
coz a single element and single column works fine but nothing more than 1

Yes of course? You use a loop and pass in the relevant information as variables (e.g. OriginAirport, DestinationAirport, DepartureDate, ReturnDate).You would do it individually for each flight with the robot. Whether you get 6 individual elements or 1 table all at once isn’t going to affect the transaction time. The longest portion of time will always be the searching of the flights and navigating the website.

Yes the XML is fine. However the website is created in such a way that it is formatting each element as a “table” which is why it isn’t working for you and why i recommend just grabbing the individual elements.

the datascraping dataextract definition requires for multiple correlated data a row defintion that can be used for iterating over all correlated data.

The available webstructure is not supporting this. Also there are variations to handle (e.g. not available flights)

A compromised result maybe can be achived with following:


using this extract XML:

	<column exact="1" name="Column1" attr="text">
		<webctrl tag="div" class="funnel-leg" idx="1"/>
		<webctrl tag="div" idx="1"/>
		<webctrl tag="div" class="funnel-flight outbound has-return" idx="1"/>
		<webctrl tag="div" idx="3"/>
		<webctrl tag="div" class="flight-grid animations-enabled" idx="1"/>
		<webctrl tag="div" class="flight-grid-window" idx="1"/>
		<webctrl tag="div" class="flight-grid-slider-wrapper" idx="1"/>
		<webctrl tag="div" class="flight-grid-slider" idx="1"/>
		<webctrl tag="div"/>

The retrieved content looks reliable enough to get it processed further e.g. parsing with regex. About the the Starting / Destination Location it can be individually be retrieved. Finally the information can be consolidated into a datatable within a structure as needed.

In some cases datascraping can be setup on procesing HTML with XML Api. But I would recommend to spent effort on to the approach described above

thank you for detailed answer, i think that’s perfect solution to put loop with multiple condition if there is flight and if there isn’t.
as for the data structure i understand the website is complicated but i even try to test scrapping on 2 single words existing in same parentelement in HTML
Origin & Destination. i was trying to put them in separate columns but the XML isn’t reading right which making me wonder if something is wrong with format… i’m missing something

<column  exact='1' name='org' attr='text'>
	<webctrl parentid='return-flight-panel' tag='SPAN' class='origin-name' parentclass='flight ej-text' />
<column  exact='1' name='dest' attr='text'>
	<webctrl parentid='return-flight-panel' tag='SPAN' class='destination-name' parentclass='flight ej-text' />

It should get each element text and put it in column 1 & column 2, or am i wrong ?