Remove duplicate rows

Hello all & @ppr :stuck_out_tongue:

I have an issue with features on excel file. I’m trying to remove the duplicate row in a data table. Sample of DT :

|First Name|Last name |ID|
|Brigite|Tabar|12345|
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

I use this to remove duplicate row :

(From d in DT.AsEnumerable()
Group d By ld= d(“ID”),
into grp=Group
Select grp.First()).CopyToDatatable()

But with this the bot will remove the first line : |Brigite|Tabar|12345| and i want that the bot remove the first line and keep the second line in the DT : |Brigite|Tabar|67890| because the ID can be updated and i want to keep the most recent, so the last in the DT.

Thanks a lot !

Expected Output?
First Name|Last name |ID|
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

Sorry i didn’t write the output. Yes you’re right !

Your case is different, it looks like you do the grouping on firtname, Last Name is my understanding correct?

How would your expected output look like in such a scenario?
First Name|Last name |ID|
|Brigite|Tabar|12345|
|Brigite|Tabar|55555|
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

Oh yes sorry is not group by the id but with the first and last name…

I would like this result :

First Name|Last name |ID|
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

I can have just 2 same first name + last name.

So if i have this DT :
|Brigite|Tabar|12345|
|Frederic|Alban|55555|
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

I would like this result :
|Frederic|Alban|32415|
|Brigite|Tabar|67890|

Give a try on:

(From d in DT.AsEnumerable()
Group d By k1= d(“FirstName”).toString.Trim,k1= d(“LastName”).toString.Trim
into grp=Group
Select grp.Last()).CopyToDatatable()

take care about “ while copy and past and change it to the corret ones

Statement is doing:
making groups by FName, LName
taking the last group member

I tried to do that but the lines are mixed, don’t know why… And i want to keep the same order (just without the first duplicate row).

I know that because i have a column with the date of the day and the bot run every day.

I tried to add something like “ORDER by” but no success and I have another issue about this date which change his format when i delete the range in the excel file and write the new DT in the same excel file for all the line which contain a “OK” on the status column…

Create an empty datatable
loop backwards through your data, starting at your last entry and working your way “back up”.
For each iteration, check if the current person is already in the new datatable:

  • if not: Add that person, including the ID, to the new datatable
  • If already there: Continue to the next person

@BCdev
you can order like below:
(From d In dtData.AsEnumerable()
Group d By k1= d(0).toString.Trim,k2= d(1).toString.Trim
Into grp=Group
Let r = grp.Last
Order By Array.IndexOf(dtData.AsEnumerable.toArray,r)
Select r).CopyToDatatable()

find demo xaml here:
BCdev.xaml (7.1 KB)

Thanks ! I will test it !

Can you just tell me what “Order By Array.IndexOf(dtData.AsEnumerable.toArray,r)” means ?

Thx !

was a quick prototype shot, maybe later we can find something more beautifully

we do a ordering on grp.first
for this we let return the position from this row which it has in the origin datatable and use it for the ordering

Hello @ppr,

have you tried to use UiPath activities Sort Data Table

and Remove Duplicate Rows

Cheers,
Dino

@dfilipovic
sure I am aware of these activities. Had you matched their functionalities with the given requirments. e.g.

  • keeping only the first occurrence (RDR Activity) vs. i want that the bot remove the first line (Reqirement by BCDev)?
  • Duplicates is done over all Columns ((RDR Activity) vs. Group by for duplicate identification via 2 cols (Req)
  • Ordering is done ove a column (ODT Activity) vs. Ordering is given by origin position not in order e.g. Number / Lexical)

Maybe I have something overlooked, so feel free to step in

Kind of repeating the same thing, but if intent is to keep most recent duplicate, then sort the datatable by the column that has the most recent indicator (numeric sequence of an id or time created, etc) first, then remove duplicates.