Isolate multiple rows with duplicate column values

Hisuhong · October 16, 2019, 3:39pm

Say I have the following table:

How should I isolate the rows where the first 4 column values are the same (So the rows highlighted in yellow and orange) ? I want to store all the ones with duplicates into a separate sheet and the ones without duplicates into another sheet

I looked at this link but the method in here does not do exactly what I need to do, (Remove Duplicate Rows By Combination of Multiple Column Values)

lakshman · October 16, 2019, 3:47pm

@Hisuhong

use Read Range activity to read the data and save it in a variable say ‘DT’.

 uniqueDT = DT.DefaultView.ToTable(True,"Company","Customer","Entry","Reference").CopyToDataTable


 duplicateDT = DT.DefaultView.ToTable(False,"Company","Customer","Entry","Reference").CopyToDataTable

Hisuhong · October 16, 2019, 3:55pm

I tried this but the problem is, for the one with the True statement, this only removes the duplicates but not the row with the duplicates. What I want to remove is both the duplicates and the row that has been duplicated

AshwinS2 · October 16, 2019, 5:18pm

hi @Hisuhong
Use this

(From p in dt.Select() where( From q in dt.Select() where string.Join(“,”,q.ItemArray).Equals(string.Join(“,”,p.ItemArray)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()

Th@nks
@shwin S

bcorrea · October 16, 2019, 5:31pm

This will compare ALL columns right? he dont need that, just the first 4… i think you will first need to identify all rows that have equal values in the 4 columns then delete them all…

Hisuhong · October 16, 2019, 5:34pm

Yes, just the first 4 columns. I’ve already tried this: DT_output.DefaultView.ToTable(True,“Company”,“Customer”,“Entry”,“Reference”) but it does not give me what I need. Furthermore, this gets rid of the values in the columns after the “Reference” column, which I do not want.

bcorrea · October 16, 2019, 5:52pm

This is what you want:
MyNewDataTable = (From p in SourceDataTable.Select() where( From q in SourceDataTable.Select() where Not q(0).Equals(p(0)) And Not q(1).Equals(p(1)) And Not q(2).Equals(p(2)) And Not q(3).Equals(p(3)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()

Hisuhong · October 16, 2019, 5:58pm

Sorry if this sounds dumb, but what are p and q?

bcorrea · October 16, 2019, 6:01pm

you dont need to know, they are internal of the command, you do need to change them, only SourceDataTable must change to your datatable name, and if prefer can pass directly to the same so it overwrites the source with new…

Hisuhong · October 16, 2019, 6:24pm

So I tried this : (From p in DT_output.Select() where( From q in DT_output.Select() where Not q(0).Equals(p(0)) And Not q(1).Equals(p(1)) And Not q(2).Equals(p(2)) And Not q(3).Equals(p(3)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable(), where DT_output is my source datatable, but the result I get is this

…Am I not understanding this properly?

bcorrea · October 16, 2019, 6:26pm

Did you write the DataTable back to excel?

Hisuhong · October 16, 2019, 6:28pm

Yes, I used Write Range

bcorrea · October 16, 2019, 6:45pm

did you inspect the NewDT in debug? did it not remove any rows?

Hisuhong · October 16, 2019, 6:51pm

I did. But it keeps giving me the same weird table I already posted

bcorrea · October 16, 2019, 6:53pm

This can only be happening if in excel your columns values are not really the same, because i have tested here and it really “deletes” all rows… do you want to give your sample excel so i can try here?

Hisuhong · October 16, 2019, 6:56pm

Input.xlsx (15.4 KB)
MEH.xaml (9.0 KB)

Here are my workflows and excel doc.

Dave · October 16, 2019, 7:33pm

I’m not going to attempt to debug the linq query you’re working on, but the p and q are variables. When working with linq and/or lamda expressions, you name the variables inline. In this case p is a datarow variable and q is also a datarow variable. You can change them to anything you’d like to make it more readable

ppr · October 16, 2019, 9:29pm

@Hisuhong
find attached sample seperating the Duplicates and NonDuplicates and writes out into 2 different Worksheets
Hisuhong.xaml (10.0 KB)
Result for Reference: Result.xlsx (9.1 KB)

the main Ideas were to Identify the Duplicates / NonDuplicates on its occurence validating the 4 columns
this was solved with using the Keys for a GroupBy and Count

E.g. NonDuplicates:
(From r In dtOrigin.AsEnumerable()
Select C1 = r(0).ToString.Trim, C2 = r(1).ToString.Trim, C3 = r(2).ToString.Trim, C4 = r(3).ToString.Trim
Group By C1, C2, C3, C4 Into Group
Select C1, C2, C3, C4, Count = Group.Count
Where Count = 1
Select New String() {C1,C2, C3, C4}).ToList

Then Rows were retrieved based on this Key Information
this was Solved with a Join Statement

E.g. Duplicates:
(From k In DupKeys
Join r In dtOrigin
On k.ElementAt(0) Equals r(0).toString And k.ElementAt(1) Equals r(1).toString And k.ElementAt(2) Equals r(2).toString And k.ElementAt(3) Equals r(3).toString
Select r).CopyToDataTable

Kindly Note: there would be several options to solve it as well. Let us know any open Questions

bcorrea · October 16, 2019, 9:44pm

well done! <3

ppr · October 16, 2019, 9:47pm

@bcorrea thanks for your words

Topic		Replies	Views
Only keep duplicates in Data Table Help datatable , activities , question	4	2628	May 13, 2020
Separate Duplicate data Help datatable , studio	8	2428	March 15, 2020
Remove rows or duplicates from datatable Help	5	7275	November 12, 2019
Eliminating duplicate rows by comparing values in different cells Studio	10	2206	May 24, 2020
Get the Duplicate Rows depending on the conditions Help	9	6261	June 14, 2019

Isolate multiple rows with duplicate column values

Related topics