Web Scraping - how to get html code from cell


#1

Hi,

I would like to use Web Scraping tool to get all html content from one cell in table or get two similar elements from one cell.

Here is html code from one row at table:


			

I’m able to get e.g. src from first image or from second, but what kind of XML I should use in Web Scraping to get either all content of “a” element or “scr” data from both “img” elements. And there can be any amount of images so I cannot use just two columns in Web Scraping.


How to get the HTML data from a datatable and click on a particular data
Getting the InnerHtml or plain Html from instance of browser
#2

Use Get Attribute ("innerHtml") activity.
After, you can parse the retrieved text.
See attached:GetAttribute.xaml (6.5 KB)


#3

I have lot of other stuff also at same table and I would like to use Web Scraping functionality.
“InnerHtml” does not seems to work in Web Scraping Data Defination like this;

<column name='Symbol' attr='innerHtml' exact='1'>
	<webctrl class='row1' tag='tr' />
	<webctrl tag='td' idx='1' />
	<webctrl tag='a' idx='1' />
</column>

Or do I have some other bug above?

Br, Mikko


#4

Indeed, web scrapping does not work with Get Attribute but there is a manual workaround: iterate through all the table’s elements using Find All Children activity and apply Get Attribute on each children within your collection.