Trying to extract data from a website

Hi. I am trying to extract data from a website using “Extract Table Data” from Activites.
However, the website has an overlay that converts the table into a clickable button and is not letting me extract data. What is a good method to try to extract data? Good example is this: https://www.realtor.com/apartments/New-York. As you can see, there are pricing data and other info but the box is a button.
Thanks for any input on this.

1 Like

Hi

Did we try with screen scraping method instead of data scrapping here

Have a view on this for an example
https://docs.uipath.com/studio/docs/output-or-screen-scraping-methods

Cheers @ymstudio1

@ymstudio1

You can try using for each ui element activity and get attribute inside it and get the innertext to get all the detaila as needed

Cheers

Yes the overlay can confuse and also need more deep analysed within its structures.

But with a postediting / extract metadata editing we have a chance to extract more granular parts


OR

<extract>
	<row exact="1">
		<webctrl tag="div" id="placeholder_property*" idx="1"/>
	</row>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="div" class="price-wrapper"/>
	</column>
	<column exact="1" name="Column2" attr="text">
		<webctrl tag="div" idx="2" />
		<webctrl tag="ul" idx="1"/>
		<webctrl tag="li" idx="1"/>
	</column>
	<column exact="1" name="Column3" attr="text">
		<webctrl tag="div" idx="2" />
		<webctrl tag="ul" idx="1"/>
		<webctrl tag="li" idx="2"/>
	</column>
	<column exact="1" name="Column4" attr="text">
		<webctrl tag="div" class="content-row" />
	</column>
</extract>

Kindly note: quick check was only about to test if we have a chance to dive more deep in or not

1 Like

Hi @Palaniyappan I appreciate for your reply. With screen scraping, wouldn’t there be an issue with multiple pages?

Thanks @Anil_G Would you please be able to elaborate? I am trying to use the Extract Table Data, and then trying to edit the XML Editor but am having trouble.

Thank you for your reply. I’ve tried putting the XML in the Extract metadata and did try to run but no luck on it. I have also edited the XML just to pull the pricing as follows:

<extract>
	<column exact='1' name='Column1' attr='text'>
		<webctrl tag='div' class='price-wrapper' />
	</column>
</extract>

I do apologize for novice questions.

I was able to get the data from metatag. Here is what I have. I basically used the CSS-selector and was a success. However, the challenge now is that some of the Texts are in

  • tags and I can’t seem to get it working. For example, in order to extract the “bed” from below HTML, I am assuming to use the data-testid=‘property-meta-beds’?

    <li data-testid="property-meta-beds" class="PropertyBedMetastyles__StyledPropertyBedMeta-rui__a4nnof-0 cHVLag"><span data-testid="meta-value">2</span>bed</li>
    
    <extract>
    	<column css-selector='.price-wrapper' name='Price' attr='text' />
    	<column css-selector='.card-reduced-amount' name='ReducedAmount' attr='text' />
    </extract>
    

Finally was able to get the data. Here is my solution to get the data i the<li data-testid="property-meta-beds"

<column css-selector='li[data-testid="property-meta-beds"]' name='Beds' attr='text' />

Above XML will retrieve the data :slight_smile:

Thanks everyone who gave me the help and direction to solve the problem!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.