HTML Table - Table Extraction Activity Issue

I am having an issue extracting table data from a website. The issue being they have a malformed html table which is throwing off UiPath Table Extract activity and cannot process any data.

Here is an example of how the tags and generic data for the problem html table.
Because of the rows being nested inside the <thead> the activity fails to preform properly.

Does anyone know a way to work around this issue so I can extract the data into a datatable?

<table width="100%" border="1" cellspacing="0" cellpadding="2" id="malformedTable">
	<thead>
		<tr>
			<td>ColHeader 1</td>
			<td>ColHeader 2</td>
			<td>ColHeader 3</td>
		</tr>
		<tr>
			<td>123456</td>
			<td>jon</td>
			<td>doe</td>
		</tr>
		<tr>
			<td>456789</td>
			<td>amy</td>
			<td>smith</td>
		</tr>
		<!-- n more <tr></tr> -->
	</thead>
</table>

Try setting the THEAD as your source object, rather than the TABLE.

This is what seemed to work. Though it requires more working around the awful website structure.
I had to add a single column only just to get the initial activity to not throw an error. Then I had to manually change to the target to the element.

Hi @trwalsh

Are you using the table extraction wizard for scraping this html table? Which version of Studio and UIAutomation library are you using?

Using the same sample html table you provided, I was able to extract to datatable variable. It had column names as Column0, Column1 and Column2 though. but it did read ColHeader1 as a datarow and we can easily loop over the rows to output it as needed.

I am using the data extract wizard, Studio 2022.4.3, and I have use 22.4.4, 22.4.7, 22.8 preview for uiautomation packages.

I edited the html to be 1 to 1, as the 3rd party site does not have a caption.

DataExtractFailure

The html I provided is a generic example of a site I am trying to scrape data from and am aware it is malformed. My problem is that I can’t control their site or html structure but regardless I still need the data.

It is strange you are running into issues. I was able to extract the data using html from your post in Studio 22.4.4

the same result after running the workflow:
image

I see you are using firefox, could you try chrome and see if you get the same results. What version of dependency packages of UiPath.System.Activities and UiPath.UIAutomation.Activities are you using.

Try setting the THEAD as your source object, rather than the TABLE.

Hi @trwalsh
It worked in edge too.

I’m using Studio 2022.4.4 with packages that came with the studio installation:
UiPath.System.Activities 22.4.4
UiPath.UIAutomation.Activities 22.4.7

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.