Delete duplicate columns in a CSV

Hi Team,

A bot of mine downloads a csv file from a portal and copies it a Google sheet. Today, it happened to see that there occured 2 columns with the same name . Hence, bot threw error saying that column already exists in the data table. So, in future, if any columns appear more than one time, I need to delete it in the data table itself and later write it to the google sheet. Could anyone help me how to do this.

Below attaching the csv file downloaded file from portal:

@Yoichi could you please help here?

1 Like

You’d have to read in the CSV without headers. Then your first row of data is the headers.

Loop through the header names in row 0 of the datatable, look for duplicates, and delete the necessary columns.

Then write the cleaned up datatable to a temp CSV without headers, then read it back in with headers.

Hi @SAMANTA_COTTACKAL1 ,


The column names in the data table are unique so there can be no duplicates, please uncheck ‘has header’ in this activity

Hope it help,

Hi,

The following post and sample will help you. (This renames either of duplicated column name. )

Regards,

Hi @Nguyen_Van_Luong1 and @postwick ,

when i removed the headers checkbox, i am getting the below error.

[Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10,Column11,Column12,Column13,Column14,Column15,Column16
First Name,Last Name,Email,Username,Role,Account Created,Groups (sharing),Organizational Group,Lucid Suite License,Lucid Suite License,Lucidscale Creator License,Lucidscale Explorer License,Trial - Lucid Suite License,Trial - Lucid Suite License,Trial - Lucidscale Creator License,Trial - Lucidscale Explorer License

image

You can use index
index of data column from 0
image
You can put index of that column at Column Index
image

Hope it hep

You have to find the index of the column you want to remove. In this example, Organizational Group is index 7 (8th column). Use that column index in the ColumnIndex property of Remove Data Column.

You can’t remove the column by name because the column names are Column1, Column2, etc and your headers are the first data row after the headers.

I’m curious why you’re removing Organizational Group, though, since it’s not a duplicate.

Hi @postwick ,

I had to remove that column as per a business requirement. Additionally, i would need to add 3 columns - Status, Product Name and Extract date. With this index , how to change it ?

image

You wouldn’t add the new columns until after you’re done removing the duplicate columns then renaming the headers.

  1. Read without headers
  2. Use data row 0 to determine duplicate headers and delete by index
  3. Write to CSV without headers
  4. Read back in from CSV with headers
  5. Delete any other columns you need to, like Extract Date, which can now be done by name instead of index
  6. Add additional columns you need

Hi @postwick ,

Team Users_.zip (409 Bytes)

I am not sure how to do the step two you mentioned. Could you please help me out with the file attached

I came up with a better way.

  1. Read CSV without headers (so column names are in row 0)
  2. Loop through row 0 values. If value appears more than once, delete column
  3. Loop through columns and rename from row 0
  4. Remove row 0

Main.xaml (22.3 KB)

2 Likes

Thank you so much @postwick ,!!! It worked perfectly. :innocent: :innocent:

1 Like

Please mark as solution.

Hi @postwick ,

Done…

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.