Remove duplicates in DT for specific columns and keep all column and values

Hi all,

need to remove duplicates in two columns (Customer Code,Jurisdiction Name) the attached sheet and also keep rest of all the column values.

if I use dt_Inputsheet.DefaultView.ToTable(True,“Customer Code”,“Jurisdiction Name”) syntax I am getting these two columns alone in the output dt . but i need rest of all the columns and corresponding values.


Sample.xlsx (6.8 MB)

Hi @prabhu_ponnusamy ,
try below query
dt.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("ColumnName")).Select(Function(g) g.First).CopyToDataTable()
or
((From LineNo In dt.DefaultView.ToTable(True,"Customer Code","Jurisdiction Name").Select().ToList() Select (From row In dt.Select Where row("Customer Code").ToString=LineNo("Customer Code").ToString and row("Jurisdiction Name").ToString=LineNo("Jurisdiction Name").ToString Select row).ToList(0)).ToList()).CopyToDatatable()

Regards,
Arivu

2 Likes

Hi @prabhu_ponnusamy ,

dt_Inputsheet.DefaultView.ToTable(True,“Customer Code”,“Jurisdiction Name”)

This always results the number of coulumns considred for distinct.

You can do like, keep the above resulted data as reference table and original data in another table. Run it in a for each loop and when the reference data matches write the original data row to new datatable .

Hi @arivu96 ,

here you can consider only single column right? what to do for multi columns?

Hi @manjula_rajendran ,

given Linq query for two column also.

Regards,
Arivu

@prabhu_ponnusamy

here we do need some sharp definitions as removal can be understood as:

  • deduplication - but which one from the duplicates is to keep e.g. other col vlas are different
  • removing all duplicates

give a try on following:

keeping from duplicates the first along the other rows

(From d in dtData.AsEnumerable
Group d by k1=d("Customer Code").toString.Trim, k2=d("Jurisdiction Name").toString.Trim into grp=Group
let mbr = grp.First()
Select r=mbr).CopyToDataTable

remove duplicates and keep only the non duplicate ones

(From d in dtData.AsEnumerable
Group d by k1=d("Customer Code").toString.Trim, k2=d("Jurisdiction Name").toString.Trim into grp=Group
Where grp.Count = 1
Select r=grp.First()).CopyToDataTable
1 Like