How to remove rows with duplicate columns

I’m sorry if this has been covered already, but I haven’t been able to find a solution. I need to remove rows from a DataTable based on a few duplicate columns.

For example, my input could be something like:

input

and I need to filter this to return:

output

The problem I’m having is that because of the ID column, none of the rows are completely unique. I need to filter to get unique rows based on “col1” and “col2” but still return all 3 columns for each “unique” row. In other words, I only want to consider col1 and col2 when determining which rows are duplicates.

I’ve tried both Excel Remove Duplicates Range and Data Table Remove Duplicate Rows but wasn’t able to get the required results. I would imagine it could be done with Linq, but I’m not too familiar with it.

Any ideas? I hope this is understandable.

Thanks,

Carlo.

2 Likes

Hi @carlor did you get any error when using the remove duplicates activities

2 Likes

You want to remove row ID3 and ID7? Is my perception correct?

  1. read excel data to collection
  2. sort collection by col&col2
  3. use ‘for each’ to collection, if col1&col2= previous col1&col2 then delete row
  4. sort collection by ID

I didn’t get any errors. I just couldn’t get it to work. Remove Duplicate Rows would only remove rows that were exactly the same. Remove Duplicates Range would remove duplicate cells in the range I selected but leave the other columns/rows in tact. It seems to be more of a “Remove Duplicate Cells”.

Carlo.

Yes. Remove row ID3 because col1 and col2 are the same as ID2 and similar with ID7.

I was hoping to avoid avoid sorting and running through a “for each” loop. There should be an easier way to do it. I’m pretty sure Linq can do it but I don’t know it well enough.

Thanks,

Carlo.

@carlor

  1. First use Read Range activity to read the data from excel file and will give you output as DataTable and say ‘inputDT’.

  2. And then try below expression to filter duplicate data.

         newDT = inputDT.DefaultView.ToTable(True,"col1","col2")
    
  3. Finally use Write Range activity to write into Excel file and pass that newDT to it.

1 Like

Thanks, but I already tried that. It only returns col1 and col2 but not the ID column. In essence I wanted to return all the columns but only use col1 and col2 when determining the “duplicate” rows.

Thanks,

Carlo

1 Like

Thank you everyone for your suggestions. After giving it some more thought, I’ve determined that I don’t need the “ID” column anymore so filtering based only on col1 and col2 is simple.

However, if anyone has a simple solution to my original query, I’d love to see it for the experience.

Thanks,

Carlo.

1 Like

@carlor

It will filter based on those mentioned columns but will give output as all columns only.

I beg to differ. I just tried it and it only returns col1 and co2. the ID column has been removed. Taking a look at my actual work (not the sample), I have 17 columns in my dataTable when I read the full range. After using your example with the DefaultView, my output dataTable only has 2 rows.

Carlo.

Hello @carlor
you can use linq here use this code in an Assign Activity

(From roww In DT.AsEnumerable() Group By id=New With {key .i=roww.item("col1"),key .d=roww.item("col2")} Into gg=Group Select gg(0)).copytodatatable

it’ll group the two columns in your case col1 and col2

the workflow should look like this

Group%20by%20calor

8 Likes

Ding! Ding! Ding! We have a winner! Give the man a cigar :slight_smile:

Thank you so much. I knew it could be done, but it would have taken me forever to figure out that line of code.

I guess I was too tired last night when I thought I didn’t need the ID, but as it turns out I actually do need it. This works out perfectly.

Thanks,

Carlo.

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.