ExtractData incorrect column names


#1

I am extracting data from a simple html table, where the first row should be the column names.

when I extract the columns names are incorrect and show Column0, Column-1… ColumnN

I do have the ExtractMetadata as <extract-table get_columns_name='1' />

Am I missing something? Is the something else I need to do?

This is the HTML of the table


#2

Hi @stephanus,

When you write to the excel, in the properties tab of write range, please check the option to add headers.


#3

Please read my issue before responding. Have a look at the second image attached in more detail.
Problem is not with writing to the excel file. My “Add Headers” selection is ticked in any case.

The problem is with the column names that were incorrectly assigned to some auto generated name “Column-N” as apposed to the to the column heading from the HTML table.


#4

Apologize for incorrectly interpreting the problem.

Is it possible for you to forward us the webpage or the HTML file to further check on the issue?

Thanks,
Sachin


#5

Please see the dummy html attached. I created this to show what happens.
I hope this helps.
extract.html (1.8 KB)


#6

I got the below output, Is this incorrect?

extract.xlsx (3.3 KB)


#7

That is correct.
Can you please provide the details of the activity?


#8

I pasted your table content in the body (keeping the css intact) to below link and did data scraping.

https://www.w3schools.com/html/tryit.asp?filename=tryhtml_table_intro


#9

I think you completely missing the point of this issue… Please see the original posts…

Did you actually extract the data in UiPath with the ExtractData acttivity?


#10

No I used DataScraping


#11

I tried to understand and totally lost at the above line that’s the reason I asked you for the result before sending the code. Sorry if it doesn’t help.


#12

Thanks for trying to help

I used the scraping, it then created the ExtractData activity which fail to detect the column names from the first row.
That is why I also asked if someone can point me to the documentation of the ExtractMetadata property…


#13

@stephanus

I used the scraping, ----Correct

it then created the ExtractData activity —Correct

which fail to detect the column names from the first row. – Can you simplify this

  1. You don’t want col1,col2,col3…and want the 1st row as header in your excel?

  2. Col1,col2…colN are mapped to wrong columns(actual) in the table?

  3. If you wanted your first row as columns, you need to modify your html (all td should be th)

    SPM Number Source Scheme Incident No Claim Number Carrier Broker Branch Section Peril Insured Vehicle Location Claim Specialist SP Allocated By Service Type Service Provider Company Service Provider Date to be Assessed Date Allocated SP Status Claim Status    


#14

Apart from what @vvaidya suggested, what you can do is after using data scraping activity on your web page, when you write the data in the excel, do not check the “Add Headers” property to avoid adding "Column* headers in the excel. This is because the header row of your table is being considered as a data row due to the way the HTML table is designed as also initially mentioned above by @vvaidya.


#15

Thanks for the reply.
We do not have control over that system and we cannot request a change for that. That is exactly the point of UiPath - to build bots that work with existing systems…

I believe it should be a feature of the ExtractData activity to define that the first row - regardless of TH or TD should be used as column names. It like reading data from CSV - you can normally define that the 1st row of data is actually the column names…

I guess I’ll just have to work around this…

Thanks for your responses


#16

It’s still doable, just not as straightforward as with automatic extraction.

Let’s break it down:

  • Webpage is structured that the header row is not distinguishable from data rows.
  • You want the datatable columns to be named correctly (obviously). By correctly is meant that the automatic column names should be overwritten with data from the first row.

What you can do:

  1. Read the data as you did before.
  2. Overwrite the column names using a loop (I’ll use foreach in this example):
    foreach (int index in Enumerable.Range(0, mydt.Columns.Count) { mydt.Columns(index).ColumnName = mydt.Rows(0)(index).ToString) } RemoveDataRow(DataTable = mydt, RowIndex = 0)
    What this does is it overwrites column names with string values from first row, then removes it.
    RemoveDataRow is an activity.

This should fill your needs. Automatic extraction is very good (IMHO), but it depends on how the source is structured. Sometimes if the input is less than ideal, you need to improvise.

Regards.


#17

May sound funny but…

How about perfrom Write Range (with Add Headers unchecked) will eliminate column1,columns2…

and Do Read Range (with Add header Checked) into a datatable so the 1st row the user wanted will be the Header.


#18

For large DT’s this could easily get inefficient.

Also might be just my paranoia, but I’m always hesitant to read-write-read data (especially with anything relating to excel), as there’s always a possibility that something (like formatting) might get altered in the middle. In memory alterations are IMHO safer in that regard, since they don’t involve additional software that may throw a fit in some edge cases.


#19

Agreed!


#20

Hi @andrzej.kniola

Regarding the following:

  1. Read the data as you did before.
  2. Overwrite the column names using a loop (I’ll use foreach in this example):
    foreach (int index in Enumerable.Range(0, mydt.Columns.Count)
    {
    mydt.Columns(index).ColumnName = mydt.Rows(0)(index).ToString)
    }
    RemoveDataRow(DataTable = mydt, RowIndex = 0)

Please will you explain where each element of the loop statement needs to be written in the, for each, activity.

Thank you