Remove duplicates in DT for specific columns and keep all column and values

prabhu_ponnusamy · August 27, 2021, 6:43am

Hi all,

need to remove duplicates in two columns (Customer Code,Jurisdiction Name) the attached sheet and also keep rest of all the column values.

if I use dt_Inputsheet.DefaultView.ToTable(True,“Customer Code”,“Jurisdiction Name”) syntax I am getting these two columns alone in the output dt . but i need rest of all the columns and corresponding values.

Sample.xlsx (6.8 MB)

arivu96 · August 27, 2021, 7:34am

Hi @prabhu_ponnusamy ,
try below query
dt.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("ColumnName")).Select(Function(g) g.First).CopyToDataTable()
or
((From LineNo In dt.DefaultView.ToTable(True,"Customer Code","Jurisdiction Name").Select().ToList() Select (From row In dt.Select Where row("Customer Code").ToString=LineNo("Customer Code").ToString and row("Jurisdiction Name").ToString=LineNo("Jurisdiction Name").ToString Select row).ToList(0)).ToList()).CopyToDatatable()

Regards,
Arivu

manjula_rajendran · August 27, 2021, 7:38am

Hi @prabhu_ponnusamy ,

dt_Inputsheet.DefaultView.ToTable(True,“Customer Code”,“Jurisdiction Name”)

This always results the number of coulumns considred for distinct.

You can do like, keep the above resulted data as reference table and original data in another table. Run it in a for each loop and when the reference data matches write the original data row to new datatable .

manjula_rajendran · August 27, 2021, 7:40am

Hi @arivu96 ,

here you can consider only single column right? what to do for multi columns?

arivu96 · August 27, 2021, 7:41am

Hi @manjula_rajendran ,

given Linq query for two column also.

Regards,
Arivu

ppr · August 27, 2021, 8:17am

@prabhu_ponnusamy

here we do need some sharp definitions as removal can be understood as:

deduplication - but which one from the duplicates is to keep e.g. other col vlas are different
removing all duplicates

give a try on following:

keeping from duplicates the first along the other rows

(From d in dtData.AsEnumerable
Group d by k1=d("Customer Code").toString.Trim, k2=d("Jurisdiction Name").toString.Trim into grp=Group
let mbr = grp.First()
Select r=mbr).CopyToDataTable

remove duplicates and keep only the non duplicate ones

(From d in dtData.AsEnumerable
Group d by k1=d("Customer Code").toString.Trim, k2=d("Jurisdiction Name").toString.Trim into grp=Group
Where grp.Count = 1
Select r=grp.First()).CopyToDataTable

Topic		Replies	Views
Remove duplicates in two columns and keep rest of all the column values Activities excel , activities , question	6	3362	August 27, 2021
Remove duplicates from two columns Studio studio , question , activities_panel	11	846	April 13, 2023
Remove Duplicate - Single Column Help	5	732	May 17, 2020
Remove duplicates from a specific column Help	15	9208	November 26, 2019
Keep unique of each Duplicate Studio studio , question , activities_panel	4	167	March 10, 2024

Remove duplicates in DT for specific columns and keep all column and values

Related topics