Isolate multiple rows with duplicate column values

Say I have the following table:

image

How should I isolate the rows where the first 4 column values are the same (So the rows highlighted in yellow and orange) ? I want to store all the ones with duplicates into a separate sheet and the ones without duplicates into another sheet

I looked at this link but the method in here does not do exactly what I need to do, (Remove Duplicate Rows By Combination of Multiple Column Values)

@Hisuhong

  1. use Read Range activity to read the data and save it in a variable say ‘DT’.

     uniqueDT = DT.DefaultView.ToTable(True,"Company","Customer","Entry","Reference").CopyToDataTable
    
    
     duplicateDT = DT.DefaultView.ToTable(False,"Company","Customer","Entry","Reference").CopyToDataTable
    

I tried this but the problem is, for the one with the True statement, this only removes the duplicates but not the row with the duplicates. What I want to remove is both the duplicates and the row that has been duplicated

hi @Hisuhong
Use this

(From p in dt.Select() where( From q in dt.Select() where string.Join(“,”,q.ItemArray).Equals(string.Join(“,”,p.ItemArray)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()

Th@nks
@shwin S

This will compare ALL columns right? he dont need that, just the first 4… i think you will first need to identify all rows that have equal values in the 4 columns then delete them all…

Yes, just the first 4 columns. I’ve already tried this: DT_output.DefaultView.ToTable(True,“Company”,“Customer”,“Entry”,“Reference”) but it does not give me what I need. Furthermore, this gets rid of the values in the columns after the “Reference” column, which I do not want. :frowning:

This is what you want:
MyNewDataTable = (From p in SourceDataTable.Select() where( From q in SourceDataTable.Select() where Not q(0).Equals(p(0)) And Not q(1).Equals(p(1)) And Not q(2).Equals(p(2)) And Not q(3).Equals(p(3)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()

1 Like

Sorry if this sounds dumb, but what are p and q?

you dont need to know, they are internal of the command, you do need to change them, only SourceDataTable must change to your datatable name, and if prefer can pass directly to the same so it overwrites the source with new… :slight_smile:

1 Like

So I tried this : (From p in DT_output.Select() where( From q in DT_output.Select() where Not q(0).Equals(p(0)) And Not q(1).Equals(p(1)) And Not q(2).Equals(p(2)) And Not q(3).Equals(p(3)) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable(), where DT_output is my source datatable, but the result I get is this

image

…Am I not understanding this properly? :sweat_smile:

Did you write the DataTable back to excel?

Yes, I used Write Range
image

did you inspect the NewDT in debug? did it not remove any rows?

I did. But it keeps giving me the same weird table I already posted :frowning:

This can only be happening if in excel your columns values are not really the same, because i have tested here and it really “deletes” all rows… do you want to give your sample excel so i can try here?

Input.xlsx (15.4 KB)
MEH.xaml (9.0 KB)

Here are my workflows and excel doc.

1 Like

I’m not going to attempt to debug the linq query you’re working on, but the p and q are variables. When working with linq and/or lamda expressions, you name the variables inline. In this case p is a datarow variable and q is also a datarow variable. You can change them to anything you’d like to make it more readable

2 Likes

@Hisuhong
find attached sample seperating the Duplicates and NonDuplicates and writes out into 2 different Worksheets
Hisuhong.xaml (10.0 KB)
Result for Reference: Result.xlsx (9.1 KB)

the main Ideas were to Identify the Duplicates / NonDuplicates on its occurence validating the 4 columns
this was solved with using the Keys for a GroupBy and Count

E.g. NonDuplicates:
(From r In dtOrigin.AsEnumerable()
Select C1 = r(0).ToString.Trim, C2 = r(1).ToString.Trim, C3 = r(2).ToString.Trim, C4 = r(3).ToString.Trim
Group By C1, C2, C3, C4 Into Group
Select C1, C2, C3, C4, Count = Group.Count
Where Count = 1
Select New String() {C1,C2, C3, C4}).ToList

Then Rows were retrieved based on this Key Information
this was Solved with a Join Statement

E.g. Duplicates:
(From k In DupKeys
Join r In dtOrigin
On k.ElementAt(0) Equals r(0).toString And k.ElementAt(1) Equals r(1).toString And k.ElementAt(2) Equals r(2).toString And k.ElementAt(3) Equals r(3).toString
Select r).CopyToDataTable

Kindly Note: there would be several options to solve it as well. Let us know any open Questions

3 Likes

well done! <3

@bcorrea thanks for your words

1 Like