How to group duplicate rows and write each duplicate row to another table minus 1 of each duplicate group

I have a data table that contains duplicate records ( See table pasted below). I want to group all duplicates (duplicate values vary in qty) and write to another table all duplicate values minus 1. The end state is to provide a listing of all duplicate values to delete except 1.
Is there a Linq statement out there? I tried coming up with looping logic to compare current row with next row but I would come across a scenario where my if statements would get to nested and when a new record that did not match previous record was encountered I would have to reset and my brain hurts now. Any help would be great!

ID 1 ID 2 (Boss)
005891271484731W111517BA 005891271484731W111517BA
005891271484731W111517BA 005891271484731W111517BA
005891271480501W111520DA 005891271480501W111520DA
005891271480501W111520DA 005891271480501W111520DA
005891271480501W111520DA 005891271480501W111520DA
005891271480501W111520DA 005891271480501W111520DA
0058912711330339W111518AA 0058912711330339W111518AA
0058912711330339W111518AA 0058912711330339W111518AA

New Table to provide for deletion
|005891271484731W111517BA|005891271484731W111517BA|
|005891271480501W111520DA|005891271480501W111520DA|
|005891271480501W111520DA|005891271480501W111520DA|
|005891271480501W111520DA|005891271480501W111520DA|
|0058912711330339W111518AA|0058912711330339W111518AA|

we can help for a LINQ as well

Do a start with below and get all duplicated rows grouped by the the 2 cols

(From d in dtData.AsEnumerable
Group d by k1=d(0).toString.Trim, k2=d(1).ToString.Trim into grp=Group
Where grp.Count > 1
Select g = grp).SelectMany(Function (x) x).CopyToDataTable

Sorry, the two columns are already grouped by like items. I shouldve only posted one column for our purpose here. In any case, I did what you said but only did it on one column. My real data table has more columns, so I will need group and copy the whole data table by the defined duplicates minus 1 in the end. I thought it was worth mentioning


just provide a sample input and the expected output for the sample input data.

not clear about this part. Maybe you can rephrase

Input Dt_1 - Duplicates in table
image

Expected Output Dt_2 - Grouped duplicates minus 1 for each group
image

Hi @re123 ,

Assuming you aren’t sure of the number of columns, and wish to order the table on the basis of the entire row, then could you give this a try?

(From row In dt.AsEnumerable()
Group row By k =String.Join(";",row.ItemArray) Into grp = Group
Let rowCount = If((grp.Count()-1).Equals(0),1,grp.Count()-1)
From i In Enumerable.Range(0,rowCount)
Let ra = k.Split({";"}, StringSplitOptions.RemoveEmptyEntries).ToArray()
Select dt_res.Rows.Add(ra)).CopyToDataTable()

image

This assumes that in case there is a unique value, that row is not to be substracted.

removeOneDuplicate.xaml (8.4 KB)

Kind Regards,
Ashwin A.K

2 Likes

when duplication should be detected by checking all columns we can do it with the string concat trick for the grouping

(From d In dt.AsEnumerable()
Let kj = String.Join("_",d.ItemArray.Select(Function (x) x.ToString.ToUpper.Trim))
Group d By k =kj Into grp = Group
Where grp.Count > 1
Select g = grp.Skip(1)).SelectMany(Function (x) x).CopyToDataTable

with the skip(1) we take the duplicates -1 from the grp

1 Like

Thank you this worked!

What are the best resources out there to give a good break down of what is happening in that code? I want to understand.

1 Like

Hi @re123 ,

Its hard to condense everything thats going on inside of that snippet of code by I’ll try my best.

LINQ is basically a code that loops.
I’ve nested two loops which you will know when you see the From keyword.

From row In dt.AsEnumerable()

What we are doing next is grouping the dataset on the basis of the entire row, and that bit is taken care of by:

Group row By k =String.Join(";",row.ItemArray) Into grp = Group

This generates an IGrouping i.e., a key(which is the string concatenated row) and linked to it are all its duplicates(if any).
This one probably went over your head, and you will only “get” this concept if you tinker around with the GroupBy Method long enough to get a feel for what it does.

Next I am being a little creative with the iterator so that it includes unique values i.e., if I didn’t perform this step which I’m about to explain, then the unique values will get omitted.

Let rowCount = If((grp.Count()-1).Equals(0),1,grp.Count()-1)

Basically, I’m ensuring that the RowCount doesn’t go below zero, because if it does the in the next step this is what I will end up with:

From i In Enumerable.Range(0,0) //this does not iterate

It won’t add the row to our final table.
The next step splits our group to return it back to its original form, and since we have fashioned it into an array, the compiler will treat it as a DataRow.

Let ra = k.Split({";"}, StringSplitOptions.RemoveEmptyEntries).ToArray()

The last step adds the currently iterated row back to our result datatable.
I hope that helps, if not I can convert this into an article and publish it on my blog.

Kind Regards,
Ashwin A.K

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.