Remove duplicate rows based on conditions

JinGe · February 4, 2021, 1:32am

Hi,
I’m trying to remove duplicate rows based on values in column ‘A’ from datatable, and from those rows that have same value in column ‘A’, I need to keep the one that contains value in column ‘B’. (if all duplicate rows are NULL in column ‘B’, then just keep one)

For example, my datatable looks like:
EmployeeID Department
2 A
6 B
1 C
5
3 A
3 A
3
1
4 B
5

After the operation, it will become:
EmployeeID Department
2 A
6 B
1 C
5
3 A
4 B

(Note: my datatable has more columns than this one)

I tried this query:
myDT.AsEnumerable().GroupBy(Function(x) convert.ToString(x.Field(of object)(“EmployeeID”))).SelectMany(function(gp) gp.ToArray().Take(1)).CopyToDataTable
But it keeps the row that contains NULL in column ‘B’. (I’m not quite sure why. Is it because the rowindex of these rows are smaller than those I wanna keep?)

Thanks in advance, I really appreciate your suggestions.

jack.chan · February 4, 2021, 3:02am

Heres an example sequence. hope it helps
(the below is the input and output of this sequence)

i have attached my sequence (, its called test.xaml) and input excel below
test.xaml (8.4 KB) example.xlsx (8.4 KB)

Sequence

Arguments
dt = datatable of input
listOfReadIDs = list to store all the unique ids that robot has already read

newdt = the output DT

flow

read input excel into dt variable
clone headers of dt into newDt
for each row in dt, check if list contains employeeID, if it does then skip row, if not add row to newDt and to the list
write newDt to result excel

Manish540 · February 4, 2021, 4:30am

Use this below linq query, @JinGe
(From p In TestDt.Select() Group p By ID=New With { Key.a =p.Item(“EmployeeID”),Key.b=p.Item(“Department”)} Into Group Select Group(0)).ToArray.CopyToDataTable()

Hope this may help you

JinGe · February 4, 2021, 5:29am

Thank you for your reply. I tried this and found that if the order of input data changed like below, it will output a row with ‘3, Null’.
1612416226(1) 1612416531

Manish540 · February 4, 2021, 5:33am

@JinGe So, you require only “5, Null” and other data???..
And you don’t need any value with “3, Null” and so on…

JinGe · February 4, 2021, 5:36am

Thank you. I tried but the query still filtered out the rows I need to keep.

JinGe · February 4, 2021, 5:41am

Yes. I don’t need any value with “3, Null” because there is at least one row that indicates Employee 3 is in Department A. And I need “5, Null” because there is no row in the data table indicating that Employee 5 belongs to any department.

jack.chan · February 4, 2021, 6:10am

use this sequence instead,
test2.xaml (12.5 KB)
heres the result (rows rearranged)

Manish540 · February 4, 2021, 7:33am

Check this below workflow, @JinGe
Uipath_RemoveDuplicates.xaml (11.6 KB)
Hope this may help you

ppr · February 4, 2021, 11:50am

@JinGe
the gamechange is the requirement of the handling the department blanks. Maybe we can reformulate the requirements as following:

Output the for each employee all distinct Department associations
- if there is a missing association for the employee:
  - output the missing departement association only if the member does not have any other department association
distinct/Duplicate removal handling should work even when there are additional columns along with department,employeeID

Following LINQ should do this and also the initial row order is set:

(From d In dtData.AsEnumerable
Group d By k=d("EmployeeID").toString.Trim Into grp=Group
Let gf = grp.Where(Function (x) Not String.IsNullOrWhiteSpace(x("Department").toString.Trim)).DefaultIfEmpty(grp.First).toList
From g In gf
Group g By k1=g("EmployeeID").toString.Trim, k2=g("Department").toString.Trim Into grp2=Group
Let gf2 = grp2.First
Order By dtData.Rows.IndexOf(gf2)
Select gf2).CopyToDataTable

grafik

JinGe · February 5, 2021, 1:49am

Magic! Thank you so much! Using linq query really improved the performance a lot since I got a quite large dataset.

JinGe · February 5, 2021, 2:03am

Thank you for your patience, your solution also works for me!

JinGe · February 5, 2021, 2:04am

Thank you for your patience, this one solved the order issue!

system · February 8, 2021, 2:05am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't delete duplicate rows based on columns and keeping the one matching a condition in an other column Help datatable , studio , question	1	898	January 5, 2021
Remove duplicate rows from two datatables based on a specific columns Activities excel , linq	8	1838	August 31, 2022
Remove Duplicate rows based on single column and that too if cell starts with specific values Activities excel , activities , question	22	876	October 12, 2023
Delete duplicate row if one column contain certain special string Academy Feedback academic_alliance , question	5	1500	August 23, 2021
Remove duplicate row if one column is blank Academy Feedback academic_alliance , question	6	1276	December 13, 2021

Most Active Users - Yesterday
ashokkarale
sonaliaggarwal47
Jon_Smith
Lynn_Song
sharazkm32
V_Roboto_V
Stef_99
Gulshan_Orujova
Aki1111
Hieronymus_Bot
More details...

Remove duplicate rows based on conditions

Related topics