Delete Duplicate Rows that and keeping the row with the newest date

I know I’m making this more complicated that necessary, but I can not figure out the solution!

How do I iterate through a data table to delete duplicate names from a data table and keeping the newest record? About 50 rows of data.

Example

John Taylor 07/14/2019
John Taylor 08/ 12/2019

Result

John Taylor 08/12/2019

The list isn’t huge and there could be more than a couple duplicates for the same name.

This process is data scraping from a web site, doing some filtering to keep only the columns I need, and I have it sort the table so that the duplicate names are grouped together.

@James_Taylor
Sort the datatable on date and then get unique keys using default view property,
Datatable.DefaultView().ToTable(<Boolean Value to remove duplicate", “”)

In your case,
dtTable.DefaultView().ToTable(true, “Name”)
where the parameters in ToTable denote -
true => remove duplicate
“Name” => reference column name to remove duplicates.

Madhavi -
Thank you for your help! I actually used the Remove Duplicate Rows activity to remove duplicates. The incoming data is a bit messy.

I’m needing to keep the row with the most recent date –

2 Columns

Resource Name | Modified Date

John Taylor | 07/14/2019
John Taylor | 08/12/2019

Like in this example I need to delete the row with John Taylor 07/14/2019.

So the result would be

John Taylor | 08/12/2019.

And there is like 50 rows that I need the RPA to sort through.

Thanks

u can do it? i have a similar problem :c

Hi !
Did you find a solution ?
I am having the same issue :confused:

Hello There,

You can try adjusting the following. It worked for me.

dtRaw.AsEnumerable.GroupBy(Function(r) r(“Name”).ToString).Select(Function(g) g.OrderBy(Function(r) DateTime.ParseExact(r(“Date”).ToString, “MM/dd/yyyy”,System.Globalization.CultureInfo.InvariantCulture )).Last).CopyToDataTable()

Best,
Ozzy

1 Like