Hi,
I’m trying to remove duplicate rows based on values in column ‘A’ from datatable, and from those rows that have same value in column ‘A’, I need to keep the one that contains value in column ‘B’. (if all duplicate rows are NULL in column ‘B’, then just keep one)
For example, my datatable looks like: EmployeeID Department
2 A
6 B
1 C
5
3 A
3 A
3
1
4 B
5
After the operation, it will become: EmployeeID Department
2 A
6 B
1 C
5
3 A
4 B
(Note: my datatable has more columns than this one)
I tried this query: myDT.AsEnumerable().GroupBy(Function(x) convert.ToString(x.Field(of object)(“EmployeeID”))).SelectMany(function(gp) gp.ToArray().Take(1)).CopyToDataTable
But it keeps the row that contains NULL in column ‘B’. (I’m not quite sure why. Is it because the rowindex of these rows are smaller than those I wanna keep?)
Thanks in advance, I really appreciate your suggestions.
Use this below linq query, @JinGe
(From p In TestDt.Select() Group p By ID=New With { Key.a =p.Item(“EmployeeID”),Key.b=p.Item(“Department”)} Into Group Select Group(0)).ToArray.CopyToDataTable()
Yes. I don’t need any value with “3, Null” because there is at least one row that indicates Employee 3 is in Department A. And I need “5, Null” because there is no row in the data table indicating that Employee 5 belongs to any department.
@JinGe
the gamechange is the requirement of the handling the department blanks. Maybe we can reformulate the requirements as following:
Output the for each employee all distinct Department associations
if there is a missing association for the employee:
output the missing departement association only if the member does not have any other department association
distinct/Duplicate removal handling should work even when there are additional columns along with department,employeeID
Following LINQ should do this and also the initial row order is set:
(From d In dtData.AsEnumerable
Group d By k=d("EmployeeID").toString.Trim Into grp=Group
Let gf = grp.Where(Function (x) Not String.IsNullOrWhiteSpace(x("Department").toString.Trim)).DefaultIfEmpty(grp.First).toList
From g In gf
Group g By k1=g("EmployeeID").toString.Trim, k2=g("Department").toString.Trim Into grp2=Group
Let gf2 = grp2.First
Order By dtData.Rows.IndexOf(gf2)
Select gf2).CopyToDataTable